Search for a command to run...
Most prokaryotic genomes in public databases are genomes reconstructed from metagenomes, forming a compendium of multiple contiguous sequences (contigs) assembled from shotgun sequencing data. Binning algorithms for assigning contigs to metagenome assembled genomes (MAGs) are manifold and continuously improving in accuracy. However, binning errors, i.e. the incorrect assignment of a contig and coding sequence to a MAG, often propagate through various databases and confound taxonomic, metabolic and/or evolutionary analyses. Here we present itBins, a fully automated python-based software that enables ultra-fast refinement of metagenomic bins using a rule-based approach harnessing information from %GC content (%GC for brevity), coverage, and taxonomy of individual contigs. When applied to the low, medium, and high complexity data of the Critical Assessment of Metagenome Interpretation (CAMI I) challenge [1], itBins produced higher F₁ scores (the harmonic mean of precision and recall) for all levels compared to other automated refinement tools, i.e., MDMcleaner and Rosella. Compared to manual refinement via uBin, itBins performed similarly well across all three complexity levels of the CAMI I dataset. With an average speed of 61 ms per bin, itBins is faster than all other refinement tools by at least three orders of magnitude when input data is accordingly available (%GC, coverage, and taxonomy), and was similarly fast when input data preparation was included in the processing time. Application to 64 real-world metagenomes from highly complex river mesocosms resulted in 259 medium-quality and 19 high-quality MAGs refined by itBins, while the other automated refinement tools failed in generating output at all or within 5000 hours of runtime. Finally, itBins also utilizes marker genes to determine the overall binning success for individual metagenomes, providing a crucial benchmark for the user to estimate the ecological relevance of their binned data. The herein introduced software itBins is broadly applicable to any type of metagenome data, integrates well with other software like DASTool, and enables swift and reliable refinement of genomes from metagenomes along with estimation of the overall binning success. itBins is distributed via EUPL 1.2 license and available at Codeberg (codeberg.org/JMK/itBins), GitHub (github.com/ProbstLab/itBins) and through Bioconda [2](bioconda.github.io/recipes/itbins/README.html).