The individual publication lists here have to be considered only in the context of the Bork Group. Authors in this color are/were of Bork Group. [Altmetric off].
The manuscripts available on our site are provided for your personal use only and may not be retransmitted or redistributed without written permissions from the paper's publisher and author. You may not upload any of this site's material to any public server, on-line service, network, or bulletin board without prior written permission from the publisher and author. You may not make copies for any commercial purpose.
Please email us for any paper request!
Meta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data. Across a total metagenomic assembly of 16 Tbp, SPIRE comprises 35 billion predicted protein sequences and 1.16 million newly constructed metagenome-assembled genomes (MAGs) of medium or high quality. Beyond mapping to the high-quality genome reference provided by proGenomes3 (http://progenomes.embl.de), these novel MAGs form 92 134 novel species-level clusters, the majority of which are unclassified at species level using current tools. SPIRE enables taxonomic profiling of these species clusters via an updated, custom mOTUs database (https://motu-tool.org/) and includes several layers of functional annotation, as well as crosslinks to several (micro-)biological databases. The resource is accessible, searchable and browsable via http://spire.embl.de.
The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.
The human gut microbiome is a key contributor to health, and its perturbations are linked to many diseases. Small-molecule xenobiotics such as drugs, chemical pollutants and food additives can alter the microbiota composition and are now recognized as one of the main factors underlying microbiome diversity. Mapping the effects of such compounds on the gut microbiome is challenging because of the complexity of the community, anaerobic growth requirements of individual species and the large number of interactions that need to be quantitatively assessed. High-throughput screening setups offer a promising solution for probing the direct inhibitory effects of hundreds of xenobiotics on tens of anaerobic gut bacteria. When automated, such assays enable the cost-effective investigation of a wide range of compound-microbe combinations. We have developed an experimental setup and protocol that enables testing of up to 5,000 compounds on a target gut species under strict anaerobic conditions within 5 d. In addition, with minor modifications to the protocol, drug effects can be tested on microbial communities either assembled from isolates or obtained from stool samples. Experience in working in an anaerobic chamber, especially in performing delicate work with thick chamber gloves, is required for implementing this protocol. We anticipate that this protocol will accelerate the study of interactions between small molecules and the gut microbiome and provide a deeper understanding of this microbial ecosystem, which is intimately intertwined with human health.
SPHINGOLIPIDS ARE DEPLETED IN ALCOHOL-RELATED LIVER FIBROSIS.
Thiele M, Suvitaival T, Trošt K, Kim M, de Zawadzki A, Kjaergaard M, Rasmussen DN, Lindvig KP, Israelsen M, Detlefsen S, Andersen P, Juel HB, Nielsen T, Georgiou S, Filippa V, Kuhn M, Nishijima S, Moitinho-Silva L, Rossing P, Trebicka J, Anastasiadou E, Bork P, Hansen T, Quigley CL, Krag A, MicrobLiver , GALAXY Consortia
Alcohol disturbs hepatic lipid synthesis and transport, but the role of lipid dysfunction in alcohol-related liver disease (ALD) is unclear. In this biopsy-controlled, prospective, observational study, we characterized the liver and plasma lipidomes in patients with early ALD.
Multi-omics analyses are used in microbiome studies to understand molecular changes in microbial communities exposed to different conditions. However, it is not always clear how much each omics data type contributes to our understanding and whether they are concordant with each other. Here, we map the molecular response of a synthetic community of 32 human gut bacteria to three non-antibiotic drugs by using five omics layers (16S rRNA gene profiling, metagenomics, metatranscriptomics, metaproteomics and metabolomics). We find that all the omics methods with species resolution are highly consistent in estimating relative species abundances. Furthermore, different omics methods complement each other for capturing functional changes. For example, while nearly all the omics data types captured that the antipsychotic drug chlorpromazine selectively inhibits Bacteroidota representatives in the community, the metatranscriptome and metaproteome suggested that the drug induces stress responses related to protein quality control. Metabolomics revealed a decrease in oligosaccharide uptake, likely caused by Bacteroidota depletion. Our study highlights how multi-omics datasets can be utilized to reveal complex molecular responses to external perturbations in microbial communities.
During the transition from a healthy state to cardiometabolic disease, patients become heavily medicated, which leads to an increasingly aberrant gut microbiome and serum metabolome, and complicates biomarker discovery. Here, through integrated multi-omics analyses of 2,173 European residents from the MetaCardis cohort, we show that the explanatory power of drugs for the variability in both host and gut microbiome features exceeds that of disease. We quantify inferred effects of single medications, their combinations as well as additive effects, and show that the latter shift the metabolome and microbiome towards a healthier state, exemplified in synergistic reduction in serum atherogenic lipoproteins by statins combined with aspirin, or enrichment of intestinal Roseburia by diuretic agents combined with beta-blockers. Several antibiotics exhibit a quantitative relationship between the number of courses prescribed and progression towards a microbiome state that is associated with the severity of cardiometabolic disease. We also report a relationship between cardiometabolic drug dosage, improvement in clinical markers and microbiome composition, supporting direct drug effects. Taken together, our computational framework and resulting resources enable the disentanglement of the effects of drugs and disease on host and microbiome features in multimedicated individuals. Furthermore, the robust signatures identified using our framework provide new hypotheses for drug-host-microbiome interactions in cardiometabolic disease.
Human gut bacterial strains can co-exist with their hosts for decades, but little is known about how these microbes persist and disperse, and evolve thereby. Here, we examined these processes in 5,278 adult and infant fecal metagenomes, longitudinally sampled in individuals and families. Our analyses revealed that a subset of gut species is extremely persistent in individuals, families, and geographic regions, represented often by locally successful strains of the phylum Bacteroidota. These "tenacious" bacteria show high levels of genetic adaptation to the human host but a high probability of loss upon antibiotic interventions. By contrast, heredipersistent bacteria, notably Firmicutes, often rely on dispersal strategies with weak phylogeographic patterns but strong family transmissions, likely related to sporulation. These analyses describe how different dispersal strategies can lead to the long-term persistence of human gut microbes with implications for gut flora modulations.
Protein-metabolite interactions play an important role in the cell's metabolism and many methods have been developed to screen them in vitro. However, few methods can be applied at a large scale and not alter biological state. Here we describe a proteometabolomic approach, using chromatography to generate cell fractions which are then analyzed with mass spectrometry for both protein and metabolite identification. Integrating the proteomic and metabolomic analyses makes it possible to identify protein-bound metabolites. Applying the concept to the thermophilic fungus Chaetomium thermophilum, we predict 461 likely protein-metabolite interactions, most of them novel. As a proof of principle, we experimentally validate a predicted interaction between the ribosome and isopentenyl adenine.
Antibiotics are used to fight pathogens but also target commensal bacteria, disturbing the composition of gut microbiota and causing dysbiosis and disease. Despite this well-known collateral damage, the activity spectrum of different antibiotic classes on gut bacteria remains poorly characterized. Here we characterize further 144 antibiotics from a previous screen of more than 1,000 drugs on 38 representative human gut microbiome species. Antibiotic classes exhibited distinct inhibition spectra, including generation dependence for quinolones and phylogeny independence for β-lactams. Macrolides and tetracyclines, both prototypic bacteriostatic protein synthesis inhibitors, inhibited nearly all commensals tested but also killed several species. Killed bacteria were more readily eliminated from in vitro communities than those inhibited. This species-specific killing activity challenges the long-standing distinction between bactericidal and bacteriostatic antibiotic classes and provides a possible explanation for the strong effect of macrolides on animal and human gut microbiomes. To mitigate this collateral damage of macrolides and tetracyclines, we screened for drugs that specifically antagonized the antibiotic activity against abundant Bacteroides species but not against relevant pathogens. Such antidotes selectively protected Bacteroides species from erythromycin treatment in human-stool-derived communities and gnotobiotic mice. These findings illluminate the activity spectra of antibiotics in commensal bacteria and suggest strategies to circumvent their adverse effects on the gut microbiota.
A Previously Undescribed Highly Prevalent Phage Identified in a Danish Enteric Virome Catalog.
Van Espen L, Bak EG, Beller L, Close L, Deboutte W, Juel HB, Nielsen T, Sinar D, De Coninck L, Frithioff-Bøjsøe C, Fonvig CE, Jacobsen S, Kjærgaard M, Thiele M, Fullam A, Kuhn M, Holm JC, Bork P, Krag A, Hansen T, Arumugam M, Matthijnssens J
Gut viruses are important, yet often neglected, players in the complex human gut microbial ecosystem. Recently, the number of human gut virome studies has been increasing; however, we are still only scratching the surface of the immense viral diversity. In this study, 254 virus-enriched fecal metagenomes from 204 Danish subjects were used to generate the anish nteric irme atalog (DEVoC) containing 12,986 nonredundant viral scaffolds, of which the majority was previously undescribed, encoding 190,029 viral genes. The DEVoC was used to compare 91 healthy DEVoC gut viromes from children, adolescents, and adults that were used to create the DEVoC. Gut viromes of healthy Danish subjects were dominated by phages. While most phage genomes (PGs) only occurred in a single subject, indicating large virome individuality, 39 PGs were present in more than 10 healthy subjects. Among these 39 PGs, the prevalences of three PGs were associated with age. To further study the prevalence of these 39 prevalent PGs, 1,880 gut virome data sets of 27 studies from across the world were screened, revealing several age-, geography-, and disease-related prevalence patterns. Two PGs also showed a remarkably high prevalence worldwide-a crAss-like phage (20.6% prevalence), belonging to the tentative subfamily, and a previously undescribed circular temperate phage infecting Bacteroides dorei (14.4% prevalence), called LoVEphage because it encodes ots f iral lements. Due to the LoVEphage's high prevalence and novelty, public data sets in which the LoVEphage was detected were assembled, resulting in an additional 18 circular LoVEphage-like genomes (67.9 to 72.4 kb). Through generation of the DEVoC, we added numerous previously uncharacterized viral genomes and genes to the ever-increasing worldwide pool of human gut viromes. The DEVoC, the largest human gut virome catalog generated from consistently processed fecal samples, facilitated the analysis of the 91 healthy Danish gut viromes. Characterizing the biggest cohort of healthy gut viromes from children, adolescents, and adults to date confirmed the previously established high interindividual variation in human gut viromes and demonstrated that the effect of age on the gut virome composition was limited to the prevalence of specific phage (groups). The identification of a previously undescribed prevalent phage illustrates the usefulness of developing virome catalogs, and we foresee that the DEVoC will benefit future analysis of the roles of gut viruses in human health and disease.
Untargeted mass spectrometry is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites' structures from MS/MS spectra is still a great challenge.
Proteotypes, like genotypes, have been found to vary between individuals in several studies, but consistent molecular functional traits across studies remain to be quantified. In a meta-analysis of 11 proteomics datasets from humans and mice, we use co-variation of proteins in known functional modules across datasets and individuals to obtain a consensus landscape of proteotype variation. We find that individuals differ considerably in both protein complex abundances and stoichiometry. We disentangle genetic and environmental factors impacting these metrics, with genetic sex and specific diets together explaining 13.5% and 11.6% of the observed variation of complex abundance and stoichiometry, respectively. Sex-specific differences, for example, include various proteins and complexes, where the respective genes are not located on sex-specific chromosomes. Diet-specific differences, added to the individual genetic backgrounds, might become a starting point for personalized proteotype modulation toward desired features.
A few commonly used non-antibiotic drugs have recently been associated with changes in gut microbiome composition, but the extent of this phenomenon is unknown. Here, we screened more than 1,000 marketed drugs against 40 representative gut bacterial strains, and found that 24% of the drugs with human targets, including members of all therapeutic classes, inhibited the growth of at least one strain in vitro. Particular classes, such as the chemically diverse antipsychotics, were overrepresented in this group. The effects of human-targeted drugs on gut bacteria are reflected on their antibiotic-like side effects in humans and are concordant with existing human cohort studies. Susceptibility to antibiotics and human-targeted drugs correlates across bacterial species, suggesting common resistance mechanisms, which we verified for some drugs. The potential risk of non-antibiotics promoting antibiotic resistance warrants further exploration. Our results provide a resource for future research on drug-microbiome interactions, opening new paths for side effect control and drug repurposing, and broadening our view of antibiotic resistance.
Bacterial metabolism plays a fundamental role in gut microbiota ecology and host-microbiome interactions. Yet the metabolic capabilities of most gut bacteria have remained unknown. Here we report growth characteristics of 96 phylogenetically diverse gut bacterial strains across 4 rich and 15 defined media. The vast majority of strains (76) grow in at least one defined medium, enabling accurate assessment of their biosynthetic capabilities. These do not necessarily match phylogenetic similarity, thus indicating a complex evolution of nutritional preferences. We identify mucin utilizers and species inhibited by amino acids and short-chain fatty acids. Our analysis also uncovers media for in vitro studies wherein growth capacity correlates well with in vivo abundance. Further value of the underlying resource is demonstrated by correcting pathway gaps in available genome-scale metabolic models of gut microorganisms. Together, the media resource and the extracted knowledge on growth abilities widen experimental and computational access to the gut microbiota.
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.
Unwanted side effects of drugs are a burden on patients and a severe impediment in the development of new drugs. At the same time, adverse drug reactions (ADRs) recorded during clinical trials are an important source of human phenotypic data. It is therefore essential to combine data on drugs, targets and side effects into a more complete picture of the therapeutic mechanism of actions of drugs and the ways in which they cause adverse reactions. To this end, we have created the SIDER ('Side Effect Resource', http://sideeffects.embl.de) database of drugs and ADRs. The current release, SIDER 4, contains data on 1430 drugs, 5880 ADRs and 140 064 drug-ADR pairs, which is an increase of 40% compared to the previous version. For more fine-grained analyses, we extracted the frequency with which side effects occur from the package inserts. This information is available for 39% of drug-ADR pairs, 19% of which can be compared to the frequency under placebo treatment. SIDER furthermore contains a data set of drug indications, extracted from the package inserts using Natural Language Processing. These drug indications are used to reduce the rate of false positives by identifying medical terms that do not correspond to ADRs.
Interactions between proteins and small molecules are an integral part of biological processes in living organisms. Information on these interactions is dispersed over many databases, texts and prediction methods, which makes it difficult to get a comprehensive overview of the available evidence. To address this, we have developed STITCH ('Search Tool for Interacting Chemicals') that integrates these disparate data sources for 430 000 chemicals into a single, easy-to-use resource. In addition to the increased scope of the database, we have implemented a new network view that gives the user the ability to view binding affinities of chemicals in the interaction network. This enables the user to get a quick overview of the potential effects of the chemical on its interaction partners. For each organism, STITCH provides a global network; however, not all proteins have the same pattern of spatial expression. Therefore, only a certain subset of interactions can occur simultaneously. In the new, fifth release of STITCH, we have implemented functionality to filter out the proteins and chemicals not associated with a given tissue. The STITCH database can be downloaded in full, accessed programmatically via an extensive API, or searched via a redesigned web interface at http://stitch.embl.de.
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.
STITCH is a database of protein-chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the previous version, the number of high-confidence protein-chemical interactions in human has increased by 45%, to 367 000. In this version, we added features for users to upload their own data to STITCH in the form of internal identifiers, chemical structures or quantitative data. For example, a user can now upload a spreadsheet with screening hits to easily check which interactions are already known. To increase the coverage of STITCH, we expanded the text mining to include full-text articles and added a prediction method based on chemical structures. We further changed our scheme for transferring interactions between species to rely on orthology rather than protein similarity. This improves the performance within protein families, where scores are now transferred only to orthologous proteins, but not to paralogous proteins. STITCH can be accessed with a web-interface, an API and downloadable files.
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
In pharmacology, it is crucial to understand the complex biological responses that drugs elicit in the human organism and how well they can be inferred from model organisms. We therefore identified a large set of drug-induced transcriptional modules from genome-wide microarray data of drug-treated human cell lines and rat liver, and first characterized their conservation. Over 70% of these modules were common for multiple cell lines and 15% were conserved between the human in vitro and the rat in vivo system. We then illustrate the utility of conserved and cell-type-specific drug-induced modules by predicting and experimentally validating (i) gene functions, e.g., 10 novel regulators of cellular cholesterol homeostasis and (ii) new mechanisms of action for existing drugs, thereby providing a starting point for drug repositioning, e.g., novel cell cycle inhibitors and new modulators of α-adrenergic receptor, peroxisome proliferator-activated receptor and estrogen receptor. Taken together, the identified modules reveal the conservation of transcriptional responses towards drugs across cell types and organisms, and improve our understanding of both the molecular basis of drug action and human biology.
Side effect similarities of drugs have recently been employed to predict new drug targets, and networks of side effects and targets have been used to better understand the mechanism of action of drugs. Here, we report a large-scale analysis to systematically predict and characterize proteins that cause drug side effects. We integrated phenotypic data obtained during clinical trials with known drug-target relations to identify overrepresented protein-side effect combinations. Using independent data, we confirm that most of these overrepresentations point to proteins which, when perturbed, cause side effects. Of 1428 side effects studied, 732 were predicted to be predominantly caused by individual proteins, at least 137 of them backed by existing pharmacological or phenotypic data. We prove this concept in vivo by confirming our prediction that activation of the serotonin 7 receptor (HTR7) is responsible for hyperesthesia in mice, which, in turn, can be prevented by a drug that selectively inhibits HTR7. Taken together, we show that a large fraction of complex drug side effects are mediated by individual proteins and create a reference for such relations.
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.de/.
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.
Combinatorial therapy is a promising strategy for combating complex disorders due to improved efficacy and reduced side effects. However, screening new drug combinations exhaustively is impractical considering all possible combinations between drugs. Here, we present a novel computational approach to predict drug combinations by integrating molecular and pharmacological data. Specifically, drugs are represented by a set of their properties, such as their targets or indications. By integrating several of these features, we show that feature patterns enriched in approved drug combinations are not only predictive for new drug combinations but also provide insights into mechanisms underlying combinatorial therapy. Further analysis confirmed that among our top ranked predictions of effective combinations, 69% are supported by literature, while the others represent novel potential drug combinations. We believe that our proposed approach can help to limit the search space of drug combinations and provide a new way to effectively utilize existing drugs for new purposes.
Protein-metabolite networks are central to biological systems, but are incompletely understood. Here, we report a screen to catalog protein-lipid interactions in yeast. We used arrays of 56 metabolites to measure lipid-binding fingerprints of 172 proteins, including 91 with predicted lipid-binding domains. We identified 530 protein-lipid associations, the majority of which are novel. To show the data set's biological value, we studied further several novel interactions with sphingolipids, a class of conserved bioactive lipids with an elusive mode of action. Integration of live-cell imaging suggests new cellular targets for these molecules, including several with pleckstrin homology (PH) domains. Validated interactions with Slm1, a regulator of actin polarization, show that PH domains can have unexpected lipid-binding specificities and can act as coincidence sensors for both phosphatidylinositol phosphates and phosphorylated sphingolipids.
Drug perturbations of human cells lead to complex responses upon target binding. One of the known mechanisms is a (positive or negative) feedback loop that adjusts the expression level of the respective target protein. To quantify this mechanism systems-wide in an unbiased way, drug-induced differential expression of drug target mRNA was examined in three cell lines using the Connectivity Map. To overcome various biases in this valuable resource, we have developed a computational normalization and scoring procedure that is applicable to gene expression recording upon heterogeneous drug treatments. In 1290 drug-target relations, corresponding to 466 drugs acting on 167 drug targets studied, 8% of the targets are subject to regulation at the mRNA level. We confirmed systematically that in particular G-protein coupled receptors, when serving as known targets, are regulated upon drug treatment. We further newly identified drug-induced differential regulation of Lanosterol 14-alpha demethylase, Endoplasmin, DNA topoisomerase 2-alpha and Calmodulin 1. The feedback regulation in these and other targets is likely to be relevant for the success or failure of the molecular intervention.
The molecular understanding of phenotypes caused by drugs in humans is essential for elucidating mechanisms of action and for developing personalized medicines. Side effects of drugs (also known as adverse drug reactions) are an important source of human phenotypic information, but so far research on this topic has been hampered by insufficient accessibility of data. Consequently, we have developed a public, computer-readable side effect resource (SIDER) that connects 888 drugs to 1450 side effect terms. It contains information on frequency in patients for one-third of the drug-side effect pairs. For 199 drugs, the side effect frequency of placebo administration could also be extracted. We illustrate the potential of SIDER with a number of analyses. The resource is freely available for academic research at http://sideeffects.embl.de.
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug-target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/.
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.
Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein-protein interactions currently available. STRING can be reached at http://string-db.org/.
Targets for drugs have so far been predicted on the basis of molecular or cellular features, for example, by exploiting similarity in chemical structure or in activity across cell lines. We used phenotypic side-effect similarities to infer whether two drugs share a target. Applied to 746 marketed drugs, a network of 1018 side effect-driven drug-drug relations became apparent, 261 of which are formed by chemically dissimilar drugs from different therapeutic indications. We experimentally tested 20 of these unexpected drug-drug relations and validated 13 implied drug-target relations by in vitro binding assays, of which 11 reveal inhibition constants equal to less than 10 micromolar. Nine of these were tested and confirmed in cell assays, documenting the feasibility of using phenotypic information to infer molecular interactions and hinting at new uses of marketed drugs.
The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de.
The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-supervised Orthologous Groups'), which contains orthologous groups constructed from Smith-Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.de.
The rapidly increasing amount of publicly available knowledge in biology and chemistry enables scientists to revisit many open problems by the systematic integration and analysis of heterogeneous novel data. The integration of relevant data does not only allow analyses at the network level, but also provides a more global view on drug-target relations. Here we review recent attempts to apply large-scale computational analyses to predict novel interactions of drugs and targets from molecular and cellular features. In this context, we quantify the family-dependent probability of two proteins to bind the same ligand as function of their sequence similarity. We finally discuss how phenotypic data could help to expand our understanding of the complex mechanisms of drug action.
The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH ('search tool for interactions of chemicals') integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug-target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.de/
Information on protein-protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files. As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 1.5 million proteins for which associations have been pre-computed. Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation. STRING is available at http://string.embl.de/