RepeatFinder#
- class pytantan.RepeatFinder#
A repeat finder using the Tantan method.
- scoring_matrix#
The scoring matrix used to derive the likelihood ratio matrix, if any.
- Type:
- likelihood_matrix#
The likelihood ratio matrix used for scoring letter pairs in tandem repeats.
- Type:
- __init__(scoring_matrix, *, repeat_start=0.005, repeat_end=0.05, decay=0.9, protein=False)#
Create a new repeat finder.
- Parameters:
scoring_matrix (
ScoringMatrix
) – The scoring matrix to use for scoring sequence alignments.repeat_start (
float
) – The probability of a repeat starting per position.repeat_end (
float
) – The probability of a repeat ending per position.decay (
float
) – The probability decay per period.protein (
bool
) – Set toTrue
to treat the input sequence as a protein sequence.
- get_probabilities(sequence)#
Get the probabilities of being a repeat for each sequence position.
- mask_repeats(sequence, threshold=0.5, mask=None)#
Mask regions predicted as repeats in the given sequence.
- Parameters:
sequence (
str
or byte-like object) – The sequence containing the repeats to mask.threshold (
float
) – The probability threshold above which to mask sequence characters.mask (
str
orNone
) – A single mask character to use for masking positions. IfNone
given, masking uses the lowercase letters of the original sequence.
- Returns:
str
– The input sequence with repeat regions masked.
Example
>>> matrix = ScoringMatrix( ... [[ 1, -1, -1, -1], ... [-1, 1, -1, -1], ... [-1, -1, 1, -1], ... [-1, -1, -1, 1]], ... alphabet="ACGT", ... ) >>> tantan = RepeatFinder(matrix) >>> tantan.mask_repeats("ATTATTATTATTATT") 'ATTattattattatt' >>> tantan.mask_repeats("ATTATTATTATTATT", mask='N') 'ATTNNNNNNNNNNNN'