TY - JOUR
T1 - Tatajuba: exploring the distribution of homopolymer tracts
AU - de Oliveira Martins, Leonardo
AU - Bloomfield, Samuel
AU - Stoakes, Emily
AU - Grant, Andrew J.
AU - Page, Andrew J.
AU - Mather, Alison E.
N1 - DATA AVAILABILITY: Tatajuba is available under the open source GNU GPL 3 licence from https://github.com/quadram-institute-bioscience/tatajuba. The software is written in ANSI C (C11 standard with GNU extensions), validated using unit tests and packaged for autotools. The software is also available on bioconda, with Docker and singularity images. The samples used in this study are from public databases (e.g. https://www.ebi.ac.uk/ena), and are listed in the Supplementary Tables and in https://github.com/quadram-institute-bioscience/tatajuba/tree/master/docs.
SUPPLEMENTARY DATA: Supplementary Data are available at NARGAB Online.
Funding information: BBSRC Institute Strategic Programme Microbes in the Food Chain [BB/R012504/1 and its constituent projects BBS/E/F/000PR10348 (Theme 1, Epidemiology and Evolution of Pathogens in the Food Chain), BBS/E/F/000PR10349 (Theme 2, Microbial Survival in the Food Chain) and BBS/E/F/000PR10352 (Theme 4, Research Infrastructure)]; Quadram Institute Bioscience BBSRC funded Core Capability Grant [BB/CCG1860/1].
PY - 2022/3
Y1 - 2022/3
N2 - Length variation of homopolymeric tracts, which induces phase variation, is known to regulate gene expression leading to phenotypic variation in a wide range of bacterial species. There is no specialized bioinformatics software which can, at scale, exhaustively explore and describe these features from sequencing data. Identifying these is non-trivial as sequencing and bioinformatics methods are prone to introducing artefacts when presented with homopolymeric tracts due to the decreased base diversity. We present tatajuba, which can automatically identify potential homopolymeric tracts and help predict their putative phenotypic impact, allowing for rapid investigation. We use it to detect all tracts in two separate datasets, one of Campylobacter jejuni and one of three Bordetella species, and to highlight those tracts that are polymorphic across samples. With this we confirm homopolymer tract variation with phenotypic impact found in previous studies and additionally find many more with potential variability. The software is written in C and is available under the open source licence GNU GPLv3.
AB - Length variation of homopolymeric tracts, which induces phase variation, is known to regulate gene expression leading to phenotypic variation in a wide range of bacterial species. There is no specialized bioinformatics software which can, at scale, exhaustively explore and describe these features from sequencing data. Identifying these is non-trivial as sequencing and bioinformatics methods are prone to introducing artefacts when presented with homopolymeric tracts due to the decreased base diversity. We present tatajuba, which can automatically identify potential homopolymeric tracts and help predict their putative phenotypic impact, allowing for rapid investigation. We use it to detect all tracts in two separate datasets, one of Campylobacter jejuni and one of three Bordetella species, and to highlight those tracts that are polymorphic across samples. With this we confirm homopolymer tract variation with phenotypic impact found in previous studies and additionally find many more with potential variability. The software is written in C and is available under the open source licence GNU GPLv3.
UR - http://www.scopus.com/inward/record.url?scp=85125142108&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqac003
DO - 10.1093/nargab/lqac003
M3 - Article
VL - 4
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
SN - 2631-9268
IS - 1
M1 - lqac003
ER -