gobics.de [F. Schreiber]

Department | Teaching | Seminar | People | Research | Publications | Software | Funding | Job openings 

Treephyler: fast taxonomic profiling of metagenomes.

Assessment of phylogenetic diversity is a key element to the analysis of microbial communities. Tools are needed to handle next-generation sequencing data and to cope with the computational complexity of large-scale studies. Here, we present Treephyler, a tool for fast taxonomic profiling of metagenomes. It combines the predictive power of tree-based and speed of signature-based approaches. Treephyler was evaluated on a real metagenome to assess its performance in comparison to previous approaches for taxonomic profiling.

How to cite

F. Schreiber, P. Gumrich , R. Daniel and P. Meinicke (2010)
Treephyler: fast taxonomic profiling of metagenomes
Bioinformatics 26, 960-961



The tree parsing method

We use the method of Nguyen et al [1] to parse classify the query sequences in a phylogenetic tree. The algorithm traverses a tree and assigns query sequences to a taxonomic rank of at least three reference sequences in the same subtree belong to the same taxonomic rank. The query sequence will then be assigned to the lowest taxonomic rank all reference sequences have in common. In case there is no such overlap, the query sequences will not be assigned to a taxonomic rank.



The Glacier Ice Metagenome Of The Northern Schneeferner (Info)


Runtime comparison - treephyler vs. other tools

We randomly chose 1% = 10,765 sequences of the glacier dataset to get results for all tools in a reasonable timeframe.
All analyses were carried out on one core of a dual-core AMD opteron 2216 2,4 GHz and 16 Gb RAM. "Execution time 100%" means that we conducted the analysis using the complete glacier ice dataset. Due to the computational complexitiy of Carma, the runtime of Carma was interpolated.
MethodNo. of coresExecution time 1%Execution time 100%
Treephyler 1 0,3 h 12 h
Phymm 1 0,3 h 30 h
Carma 1 168,1 h 696 h

Runtime comparison - treephyler

We used the full glacier ice dataset to conduct runtime analyses on varying number of computers. The first analysis were done on our computer cluster using 50 processor cores (AMD opteron 2216 2,4 GHz, 8 Gb RAM), the second analysis was done on a single computer with 8-cores.
MethodNo. of coresExecution time / corePercentage of assigned reads
Treephyler5013 mins15%
Treephyler8139 mins15%

Prediction comparison

The assessment of accuracy of the different methods is based on the full glacier ice dataset (1,076,539 sequences). Treephyler and Phymm were run on own computers, whereas the results of Carma were taken from 3.
MethodPercentage of assigned reads
Phymm~99 %

Phylum level

Accuracy assessment using the gletscher dataset.

Class level

Additional comparison were performed on the class level for bacteria. Since there were no predictions available for the 16S analysis, only the three methods Treephyler, Carma, and Phymm could be compared.

Accuracy assessment using the gletscher dataset on the class level.

Please direct your questions and comments to fabian@gobics.de.


[1] Nguyen, T.X. et al. (2006) Phylogenetic analysis of general bacterial porins: a phylogenomic case study, J Mol Microbiol Biotechnol, 11, 291-301.
[2] Simon C, Herath J, Rockstroh S, Daniel R (2009) Rapid identification of genes encoding DNA polymerases by function-based screening of metagenomic libraries derived from glacial ice. Appl. Environ. Microbiol. 75: 2964-2968. [3] Krause, L., Diaz, N.N., Goesmann, A., Kelley, S., Nattkemper, T.W., Rohwer, F., Edwards, R.A. and Stoye, J. (2008) Phylogenetic classification of short environmental DNA fragments, Nucleic Acids Res, 36, 2230-2239.
[4] Brady, A. and Salzberg, S.L. (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models, Nat Methods, 6, 673-676.