API Reference#

Bindings to tantan, a method for finding repeats in biological sequences.

References

  • Frith, Martin C. A new repeat-masking method enables specific detection of homologous sequences. Nucleic acids research vol. 39,4 (2011): e23. doi:10.1093/nar/gkq1212.

Functions#

pytantan.mask_repeats(sequence, *, protein=False, scoring_matrix=None, match_score=None, mismatch_cost=None, repeat_start=0.005, repeat_end=0.05, repeat_period=None, decay=0.9, mask=None, threshold=0.5)#

Mask regions predicted as repeats in the given sequence.

Parameters:
  • sequence (str or byte-like object) – The sequence containing the repeats to mask.

  • protein (bool) – Set to True to treat the input sequence as a protein sequence.

  • scoring_matrix (str or ScoringMatrix) – A scoring matrix to use for scoring character matches and mismatches. Either pass a matrix name (such as BLOSUM62) to load a built-in matrix, or a pre-initialized ScoringMatrix object.

  • match_score (int) – The score for scoring character matches. Must be set along mismatch_cost. Incompatible with the scoring_matrix option.

  • match_score – The penalty for scoring character mismatches. Must be set along match_score. Incompatible with the scoring_matrix option.

  • repeat_start (float) – The probability of a repeat starting per position.

  • repeat_end (float) – The probability of a repeat ending per position.

  • decay (float) – The probability decay per period.

  • threshold (float) – The probability threshold above which to mask sequence characters.

  • mask (str or None) – A single mask character to use for masking positions. If None given, masking uses the lowercase letters of the original sequence.

pytantan.default_scoring_matrix(protein=False, match_score=None, mismatch_cost=None)#

Get the default Tantan scoring matrix for the given parameters.

Classes#

pytantan.Alphabet

An alphabet used for encoding sequences with ordinal encoding.

pytantan.LikelihoodMatrix

A likelihood ratio matrix derived from a scoring matrix.

pytantan.RepeatFinder

A repeat finder using the Tantan method.