PyTantan #
Cython bindings and Python interface to Tantan, a fast method for identifying repeats in DNA and protein sequences.
Overview#
Tantan is a fast method developed by Martin Frith to identify simple repeats in DNA or protein sequences. It can be used to mask repeat regions in reference sequences, and avoid false homology predictions between repeated regions.
PyTantan is a Python module that provides bindings to Tantan using Cython. It implements a user-friendly, Pythonic interface to mask a sequence with various parameters. It interacts with the Tantan interface rather than with the CLI, which has the following advantages:
Just add pytantan
as a pip
or conda
dependency, no need
for the tantan
binary or any external dependency.
Use any scoring matrix from the scoring-matrices package or build your own.
Easily run computations in parallel querying thread-safe
RepeatFinder
with several sequences in parallel.
Get the same results as Tantan! You are using the same code under the hood.
Get SIMD-acceleration on any supported platform without having to build the package from scratch.
Setup#
PyTantan is available for all modern Python versions (3.7+).
Run pip install pytantan
in a shell to download the latest release
from PyPi, or have a look at the Installation page to find
other ways to install pytantan
.
Library#
License#
This library is provided under the GNU General Public License v3.0 or later. Tantan is developed by Martin Frith and is distributed under the terms of the GPLv3 or later as well. See the Copyright page for more information.
This project was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.