API Reference#
Bindings to tantan, a method for finding repeats in biological sequences.
References
Frith, Martin C. A new repeat-masking method enables specific detection of homologous sequences. Nucleic acids research vol. 39,4 (2011): e23. doi:10.1093/nar/gkq1212.
Functions#
- pytantan.mask_repeats(sequence, *, protein=False, scoring_matrix=None, match_score=None, mismatch_cost=None, repeat_start=0.005, repeat_end=0.05, repeat_period=None, decay=0.9, mask=None, threshold=0.5)#
Mask regions predicted as repeats in the given sequence.
- Parameters:
sequence (
str
or byte-like object) – The sequence containing the repeats to mask.protein (
bool
) – Set toTrue
to treat the input sequence as a protein sequence.scoring_matrix (
str
orScoringMatrix
) – A scoring matrix to use for scoring character matches and mismatches. Either pass a matrix name (such asBLOSUM62
) to load a built-in matrix, or a pre-initializedScoringMatrix
object.match_score (
int
) – The score for scoring character matches. Must be set alongmismatch_cost
. Incompatible with thescoring_matrix
option.match_score – The penalty for scoring character mismatches. Must be set along
match_score
. Incompatible with thescoring_matrix
option.repeat_start (
float
) – The probability of a repeat starting per position.repeat_end (
float
) – The probability of a repeat ending per position.decay (
float
) – The probability decay per period.threshold (
float
) – The probability threshold above which to mask sequence characters.mask (
str
orNone
) – A single mask character to use for masking positions. IfNone
given, masking uses the lowercase letters of the original sequence.
- pytantan.default_scoring_matrix(protein=False, match_score=None, mismatch_cost=None)#
Get the default Tantan scoring matrix for the given parameters.
Classes#
An alphabet used for encoding sequences with ordinal encoding. |
|
A likelihood ratio matrix derived from a scoring matrix. |
|
A repeat finder using the Tantan method. |