gobics.de [F. Schreiber]

Department | Teaching | Seminar | People | Research | Publications | Software | Funding | Job openings 


Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of ortholog sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically.

OrthoSelect is a easy-to-install and easy-to-use tool for finding ortholog groups in EST databases. It automatically searches assembled EST sequences against databases of ortholog groups (OG), assigns ESTs to these pre-defined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified ortholog sequences and others the possibility to further process this alignment in a last step. OrthoSelect performes better than the best-hit selection strategy and shows reliable results re-annotating database member sequences of OrthoMCL-DB and KOG. Since a correct orthology assignment is a important prerequisite for the construction of reliable data sets, OrthoSelect is capable of producing such data sets. This makes a OrthoSelect a valuable tool for researcher dealing with large EST libraries focussing on constructing data sets for phylogenetic reconstructions.


We have used the We have used the phylogenomic data set of Dunn et al.to benchmark the performance of our tool and showed that OrthoSelect is able to produce high quality data sets for phylogenomic studies. Our tool achieved a specificity of 98%. Furthermore, our tool could find slightly more orthologous sequences than the orginal data set contained (+4%) with an error rate of less then 1%.

Web server

OrthoSelect web server

How to cite

If you use the OrthoSelect tool or web server please cite:



Please direct your questions and comments to fabian@gobics.de.