Alphabet#

class pytantan.Alphabet#

An alphabet used for encoding sequences with ordinal encoding.

__getitem__(key, /)#

Return self[key].

__init__(letters, protein)#

Create a new alphabet with the given letters.

Parameters:
  • letters (str) – The letters of the alphabet, in order.

  • protein (bool) – Set to True to signal that this alphabet is a protein alphabet and not a nucleotide one.

__len__()#

Return len(self).

__str__()#

Return str(self).

decode(encoded)#

Decode an ordinal-encoded sequence using the alphabet.

Parameters:

sequence (byte-like object) – The sequence to decode.

Returns:

str – The decoded sequence.

Raises:

ValueError – When the sequence contains invalid indices.

Example

>>> alphabet = Alphabet("ACGT")
>>> alphabet.decode(bytearray([2, 0, 3, 0, 1, 0]))
'GATACA'
decode_into(encoded, sequence)#

Decode a sequence from ordinal-encoding into the given buffer.

encode(sequence)#

Encode a sequence to an ordinal-encoded sequence using the alphabet.

Parameters:

sequence (str or byte-like object) – The sequence to encode.

Returns:

bytes – The encoded sequence.

Raises:
  • ValueError – When the sequence contains invalid characters, or

  • unknown sequence characters while the alphabet contains no

  • wildcard character.

Example

>>> alphabet = Alphabet("ACGT")
>>> alphabet.encode("GATACA")
b'\x02\x00\x03\x00\x01\x00'
encode_into(sequence, encoded)#

Encode a sequence to ordinal-encoding into the given buffer.