Inferring the taxonomic composition of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in computational biology. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmental sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependency complicates comparative analysis of data originating from different sequencing platforms or preprocessing pipelines. We have developed a read length-independent method for taxonomic profiling and we provide a freely available Matlab/Octave toolbox which includes an ultra-fast implementation of that method. Besides the platform-independent toolbox we also provide a prototype tool implementation for Windows that allows the user to compare a large number of preprocessed metagenomes within a graphical environment. Our tests indicate that Taxy results compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, Taxy provides a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. As input, DNA sequences in terms of multi-FASTA files of any size can be used for the estimation of metagenomic profiles. The analysis of a large sequence file with a Gbp volume typically requires less than a minute of processing time and can even be performed on a standard notebook.
The platform-independent toolbox for the Matlab/Octave programming environment
can be downloaded here: Taxy toolbox
The toolbox contains functions for taxonomic profiling of large multi-FASTA DNA files
and allows the user to easily modify and adapt the functionality.
We tested the toolbox with Matlab 7 and with the freely available GNU Octave 3.X under Windows and Linux
operating systems.
The Taxy software installer can be
downloaded here: Taxy installer
A software manual will be unpacked by the installer, or alternatively can
be downloaded here: Taxy manual
The Taxy tool requires the MATLAB Compiler Runtime (MCR). The Taxy installer will automatically launch
a packaged MCR version 7.8. installer. In the unlikely case that a
MCR version 7.8. is already installed on the system, we recommend
re-installing it, using the Taxy packaged MCR installer. In case another
version of the MCR is already installed on the system, we recommend
installing the Taxy packaged MCR version 7.8. beside it, in order to
minimize any conflict with other MCR dependent programs.
Minimal Requirements: Windows XP, Vista or Windows 7; 1 GB RAM; 500 MB of free hard
drive space
P. Meinicke, K. Asshauer and T. Lingner.
"Mixture models for analysis of the taxonomic composition of metagenomes",
Bioinformatics, 27(12):1618-1624, 2011.