About MicrobiomeDB

What is it?

High-throughput sequencing has revolutionized microbiology by allowing scientists to complement culture-based approaches with culture-independent profiling of complex microbial communities. Whether studying these communities in soil, on plants, or in animals, the collection of community composition data is often accompanied with rich metadata that describes the source from which the sample was derived, how samples were treated prior to collection, and how they were processed after collection. Increasingly, the goal of microbiome experiments is to understand how these various attributes represented by the metadata, influence the microbial community. MicrobiomeDB was developed as a discovery tool that empowers researchers to fully leverage their experimental metadata to construct queries that interrogate microbiome datasets.

How was it made?

A key feature of MicrobiomeDB is the development of an automated pipeline for loading raw fastq files from microbiome experiments using the standard Biological Observation Martix (.biom) as input. This file format can be produced for any experiment processed using the popular and powerful software suites QIIME and Mothur, and is also the standard format used by both the Earth Microbiome Project and the Human Microbiome Project. Relative abundance data are extracted from the .biom file and mapped to the GreenGenes database (by ID) to retrieve full 16S sequences, NCBI taxon identifiers, and taxonomic strings. Alpha-diversity metrics are pre-calculated and these data are loaded together into the database.

Although taxa abundance tables and diversity metrics are useful, the real power of MicrobiomeDB comes from the fact that all the 'metadata' terms used by the experimenter to describe each sample are also extracted from the .biom file. These terms are mapped to the MIxS ontology and unmapped terms are manually curated and used to expand a custom, MIxS-compliant, ontology tree. This rich, structured sample description generates an ISA.tab file that is then loaded into microbiomeDB. When combined with the extensive web toolkit and infrastructure developed by EuPathDB, the user is provided with an web interface to interrogate complex, even massive-scale, microbiome studies using metadata queries. The resulting queries are then visualized using Shiny app plug-ins available directly in the browser.

In its current state, MicrobiomeDB is a 'first-pass' example of microbiome data mining. We envision significantly expanding our pipeline to include loading additional 16S rRNA databases, metadata that describe taxa (i.e. basic microbiological properties), as well as bacterial metabolic pathway databases (i.e. KEGG), and much more. Although the experimental datasets currently loaded are from 16S rRNA marker gene sequencing, our pipeline would also accommodate similarly formatted, taxa abundance data from shotgun metagenomic studies, and future functionality could allow loading tables of bacterial gene expression data derived from these studies. Taken together, we hope to develop a full-featured, open-source platform for a systems biology view of microbial communities.

How do I use it?

See the Learn How guide.

Who is behind this?

Daniel Beiting5 — Project PI
David Roos4 — Project co-PI
The EuPathDB Team
John Brestelli 3, Shon Cade 3, Steve Fischer 4, Cristina Aurrecoechea 1, Ryan Doherty 3, Dave Falke 1, Mark Heiges 1, Christian J. Stoeckert Jr. 3, Jie Zheng 2,3, Jessica Kissinger 1, Brian Brunk 4
Collaborators outside the USA
Gabriel Fernandes 6
Francislon Silva de Oliveira 6
  1. Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602 USA
  2. Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104 USA
  3. Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104 USA
  4. Department of Biology, University of Pennsylvania, Philadelphia, PA 19104 USA
  5. Department of Pathobiology, University of Pennsylvania, Philadelphia, PA 19104 USA
  6. Centro de Pesquisas Rene Rachou (Fiocruz Minas), Belo Horizonte, Brazil