PyTantan Stars#

Cython bindings and Python interface to Tantan, a fast method for identifying repeats in DNA and protein sequences.

Actions Coverage PyPI Bioconda AUR Wheel Versions Implementations License Source Mirror Issues Docs Changelog Downloads

Overview#

Tantan is a fast method developed by Martin Frith to identify simple repeats in DNA or protein sequences. It can be used to mask repeat regions in reference sequences, and avoid false homology predictions between repeated regions.

PyTantan is a Python module that provides bindings to Tantan using Cython. It implements a user-friendly, Pythonic interface to mask a sequence with various parameters. It interacts with the Tantan interface rather than with the CLI, which has the following advantages:

Batteries-included

Just add pytantan as a pip or conda dependency, no need for the tantan binary or any external dependency.

Flexible

Pass any str or bytes-like object containing the raw sequence as an input, and get a str as the output.

Configurable

Use any scoring matrix from the scoring-matrices package or build your own.

Parallel

Easily run computations in parallel querying thread-safe RepeatFinder with several sequences in parallel.

Consistent

Get the same results as Tantan! You are using the same code under the hood.

Portable

Get SIMD-acceleration on any supported platform without having to build the package from scratch.

Setup#

PyTantan is available for all modern Python versions (3.6+).

Run pip install pytantan in a shell to download the latest release from PyPi, or have a look at the Installation page to find other ways to install pytantan.

Library#

License#

This library is provided under the GNU General Public License v3.0 or later. Tantan is developed by Martin Frith and is distributed under the terms of the GPLv3 or later as well. See the Copyright page for more information.

This project was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.