In the reconstruction of microbial genomes from metagenomic sequence data, the estimation of the final completeness and possible contamination is crucial for quality control. In metagenomics candidate genomes are usually obtained from a metagenome assembly and a subsequent binning of the assembled contigs. BinChecker provides a novel approach to quality assessment that is based on a fast protein domain search and a clustering approach for identification of marker domain (“feature”) sets. The feature sets that are used for estimation are not pre-computed for a given database of reference genomes, but are individually found for each bin by adaptive clustering and feature selection. In particular, the adaptivity facilitates the creation and extension of the underlying database, which just requires to add protein feature profiles of reference genomes. Tests with simulated bins indicate that the prediction accuracy of BinChecker meets the current state of the art while providing significant advantages in terms of speed and flexibility.
doi: https://doi.org/10.1101/2021.10.01.462745