Search for a command to run...
General InformationThis is the default database of pyTax4Fun2, a Python implementation of the Tax4Fun2 pipeline for prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Originally, this database was the default for Tax4Fun2 v1.1.5, as described in the publication "Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences" [1]. It was later made available by W. Song at https://doi.org/10.5281/zenodo.10035668 following the closure of the AARNet CloudStor service (formerly at cloudstor.aarnet.edu.au). While we have aimed to preserve the original structure as faithfully as possible, we have introduced several modifications to suit our implementation: We replaced the original KEGG KO-to-pathway mapping file (ko2ptw.txt) used in Tax4Fun2 with a customized version better aligned with our implementation. The original Tax4Fun2 ko2ptw.txt is retained as ko2ptw.txt.original_Tax4Fun2 inside KEGG.zip. Just rename ko2ptw.txt.original_Tax4Fun2 to ko2ptw.txt, and currently active ko2ptw.txt to ko2ptw.txt.original_pyTax4Fun2 if you consider to use Tax4Fun2 instead of pyTax4Fun2. fix_ko2ptw.py, a Python script for converting the original Tax4Fun2 ko2ptw.txt to pyTax4Fun2-suitable ko2ptw.txt. For TOOLS.zip, we retained the original binaries shipped with the default database provided by W. Song, including Diamond v0.9.24 [2], Prodigal v2.6.3 [3], and VSEARCH v2.14.1 [4]. In our version of the database, we also included recent versions of Diamond (v2.1.15) [5] and VSEARCH (v2.30.4), which are compatible with and have been tested in our implementation. All binaries are provided in their original, unmodified state. As these tools are licensed under the GNU General Public License v3.0, copies of their respective licenses are included in TOOLS.zip to comply with the terms of the GNU General Public License v3.0. Since this database is derived from the original Tax4Fun2 database, which is distributed under the GNU General Public License v3.0, it remains available under the same license. However, our implementation—pyTax4Fun2—is distributed under the GNU Affero General Public License v3.0. Citation Currently, pyTax4Fun2 do not have paper dedicated to it... For now. However, we at Generasi Biologi Indonesia Foundation indeed have 2 technical reports [6,7] and an undergraduate thesis (or, in Indonesian, a skripsi) [8], which we recommend to cite if you use pyTax4Fun2 in your research. When citing pyTax4Fun2, please additionally cite the original Tax4Fun2 paper, which developed the actual method. Until a paper dedicated to pyTax4Fun2 published online, you can for instance cite all works as: pyTax4Fun2 (Nashrulloh and Rahardi, 2026; Rachmadani et al., 2026; Rachmadani, 2026), a Python implementation to Tax4Fun2 (Wemheuer et al., 2020). Zenodo entry for the database can be cited as well if you use the database only. References Wemheuer, F., Taylor, J.A., Daniel, R., Johnston, E., Meinicke, P., Thomas, T., Wemheuer, B. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environmental Microbiome 15, 11 (2020). doi: 10.1186/s40793-020-00358-7. Buchfink, B., Xie, C., Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59-60 (2015). doi: 10.1038/nmeth.3176. Hyatt, D., Chen, G.L., LoCascio, P.F., Land, M.L. Larimer, F. W. Hauser, L. J. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). doi: 10.1186/1471-2105-11-119. Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016). doi: 10.7717/peerj.2584. Buchfink, B., Reuter, K., Drost, H.G., Sensitive protein alignments at tree-of-life scale using DIAMOND, Nature Methods 18, 366–368 (2021). doi: 10.1038/s41592-021-01101-x. Nashrulloh, M.M., Rahardi, B. pyTax4Fun2: A Python Tool for Functional Profiling and Redundancy Analysis of Bacterial Communities via 16S rRNA Gene Sequences, Featuring Polars for Efficient Processing of Large Genomic Datasets—I: Initial Development (Technical Report No. GBR-TR-BIOMIKA-02/Genbinesia/I/2026). Generasi Biologi Indonesia Foundation. Gresik, Indonesia (2026). Rachmawati, N., Nashrulloh, M.M., Mustafa, I., Rahardi, B., Ainiyati, C., Hafazallah, K., Raihandhany, R., Tamam, Mh.B. pyTax4Fun2: A Python Tool for Functional Profiling and Redundancy Analysis of Bacterial Communities via 16S rRNA Gene Sequences, Featuring Polars for Efficient Processing of Large Genomic Datasets—II: Further Development (On Alpha Diversity and Beta Diversity) (Technical Report No. GBR-TR-BIOMIKA-04/Genbinesia/III/2026). Generasi Biologi Indonesia Foundation. Gresik, Indonesia (2026). Rachmawati, N. Analisis Fungsional Komunitas Bakteri dengan pyTax4Fun2 dan Optimasi Pemrosesan Dataset Besar Menggunakan Polars. Skripsi. Departemen Biologi, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Brawijaya. Malang, Indonesia (2026). [in Indonesian].