6.
proGenomes4: providing 2 million accurately and consistently annotated high-quality prokaryotic genomes.
Fullam A,
Letunic I,
Maistrenko OM, Castro AA,
Coelho LP,
Grekova A,
Schudoma C,
Khedkar S,
Robbani SM,
Kuhn M,
Schmidt TSB,
Bork P,
Mende DR 2025 Nov 20; [Epub ahead of print] PubMed:
41261732
Abstract
The pervasive availability of publicly available microbial genomes has opened many new avenues for microbiology research, yet it also demands robust quality control and consistent annotation pipelines to ensure meaningful biological insights. proGenomes4 (prokaryotic Genomes v4) addresses this challenge by providing a resource of nearly 2 million high-quality microbial genomes, a doubling in scale from previous versions, encompassing over 7 billion genes. Each genome underwent rigorous quality assessment and comprehensive functional annotation by applying multiple standardized annotation workflows, including the systematic identification of mobile genetic elements and biosynthetic gene clusters. proGenomes4 contains 32 887 species with ecological habitat metadata as well as precomputed pan-genomes. This substantially expanded resource provides the microbiology community with a foundation for large-scale comparative studies and is freely accessible via a newly developed command line interface and at https://progenomes.embl.de/.
5.
Metalog: curated and harmonised contextual data for global metagenomics samples.
Kuhn M,
Schmidt TSB,
Ferretti P,
Glazek AM,
Robbani SM,
Akanni W,
Fullam A,
Schudoma C, Cetin E, Hassan M, Noack K, Schwarz A, Thielemann R, Thomas L, von Stetten M,
Alves RJ,
Iyappan A,
Kartal E, Kel I,
Keller MI,
Maistrenko OM,
Mankowski A,
Nishijima S,
Podlesny D,
Schiller J,
Schulz S,
Van Rossum T,
Bork P 2025 Oct 31; [Epub ahead of print] PubMed:
41171125
Abstract
Metagenomic sequencing enables the in-depth study of microbes and their functions in humans, animals, and the environment. While sequencing data is deposited in public databases, the associated contextual data is often not complete and needs to be retrieved from primary publications. This lack of access to sample-level metadata like clinical data or in situ observations impedes cross-study comparisons and meta-analyses. We therefore created the Metalog database, a repository of manually curated metadata for metagenomics samples across the globe. It contains 80 423 samples from humans (including 66 527 of the gut microbiome), 10 744 animal samples, 5547 ocean water samples, and 23 455 samples from other environmental habitats such as soil, sediment, or fresh water. Samples have been consistently annotated for a set of habitat-specific core features, such as demographics, disease status, and medication for humans; host species and captivity status for animals; and filter sizes and salinity for marine samples. Additionally, all original metadata is provided in tabular form, simplifying focused studies e.g. into nutrient concentrations. Pre-computed taxonomic profiles facilitate rapid data exploration, while links to the SPIRE database enable genome-based analyses. The database is freely available for browsing and download at https://metalog.embl.de/.
4.
Fecal microbial load is a major determinant of gut microbiome variation and a confounder for disease associations.
Nishijima S, Stankevic E, Aasmets O,
Schmidt TSB, Nagata N,
Keller MI,
Ferretti P, Juel HB,
Fullam A,
Robbani SM,
Schudoma C, Hansen JK, Holm LA, Israelsen M, Schierwagen R, Torp N, Telzerow A, Hercog R,
Kandels S,
Hazenbrink DHM,
Arumugam M, Bendtsen F, Brøns C, Fonvig CE, Holm JC, Nielsen T, Pedersen JS, Thiele MS, Trebicka J, Org E, Krag A, Hansen T,
Kuhn M,
Bork P, GALAXY and MicrobLiver Consortia
2025 Jan 9; 188(1): 222-236.e15. PubMed:
39541968
Abstract + PDF
The microbiota in individual habitats differ in both relative composition and absolute abundance. While sequencing approaches determine the relative abundances of taxa and genes, they do not provide information on their absolute abundances. Here, we developed a machine-learning approach to predict fecal microbial loads (microbial cells per gram) solely from relative abundance data. Applying our prediction model to a large-scale metagenomic dataset (n = 34,539), we demonstrated that microbial load is the major determinant of gut microbiome variation and is associated with numerous host factors, including age, diet, and medication. We further found that for several diseases, changes in microbial load, rather than the disease condition itself, more strongly explained alterations in patients' gut microbiome. Adjusting for this effect substantially reduced the statistical significance of the majority of disease-associated species. Our analysis reveals that the fecal microbial load is a major confounder in microbiome studies, highlighting its importance for understanding microbiome variation in health and disease.
3.
metaTraits: a large-scale integration of microbial phenotypic trait information.
Podlesny D, Kim CY,
Robbani SM,
Schudoma C,
Fullam A, Reimer LC, Koblitz J, Schober I,
Iyappan A,
Van Rossum T,
Schiller J,
Grekova A,
Kuhn M,
Bork P 2025 Nov 26; [Epub ahead of print] PubMed:
41296543
Abstract
Microbes differ greatly in their organismal structure, physiology, and environmental adaptation, yet information about these phenotypic traits is dispersed across multiple databases and is largely unavailable for taxa that remain uncultured. Here, we present metaTraits, a unified and accessible trait resource that integrates culture-derived trait information from BacDive, BV-BRC, JGI IMG, and GOLD with genome-based predictions for medium and high-quality isolate and metagenome-assembled genomes (MAGs) from proGenomes and SPIRE. metaTraits covers over 2.2 million genomes and >140 harmonized traits mapped to standardized ontologies, spanning cell morphology (e.g. shape, size, and Gram staining), physiology (e.g. motility and sporulation), metabolic and enzymatic activities, environmental preferences (e.g. temperature, salinity, and oxygen tolerance), and lifestyle categories. All records are linked to the original evidence, and species are cross-linked to NCBI and GTDB taxonomies. The interactive metaTraits website provides search and visualization tools, taxonomy-level summaries, and two workflows for annotating user-submitted genomes or community profiles. metaTraits substantially advances accessibility and interoperability of microbial trait data, enabling comprehensive trait-based analyses of microbiomes across diverse environments. metaTraits is accessible via https://metatraits.embl.de.