uBiome – The Largest Human Microbiome Dataset

In the beginning

The term “microbiome” was used for the first time in 2001 by Joshua Lederberg, who used it to describe the “ecological community of the microorganisms that literally share our body space”.  The first human microbiome sequencing studies included data from small numbers of subjects, such as a groundbreaking 2005 study about the gut microbiome of three individuals, co-authored by leading microbiome scientist Elisabeth Bik (who now works here at uBiome!). This study contained a dataset of 13,000 sequence reads, which was an enormous amount of data at the time and the largest microbiome dataset obtained to date — but it paled quickly in comparison to the many studies that followed.

In 2008, the Human Microbiome Project (HMP) was launched by the National Institutes of Health (NIH). The goal of HMP was to catalogue the human microbiomes obtained from five different body sites from 242 people. The results of this project were published in 2012. This initial HMP publication with 25 million sequences served as the basis for many other studies that looked at the microbiome from different anatomical sites and from people of different ages and health conditions.  

uBiome founded in 2012

Later that same year, uBiome was founded. Scientists from UCSF and Oxford, uBiome’s founders wanted to bring techniques developed in academic laboratories to the world, and apply large scale computational approaches to the nascent field of the microbiome. Knowing how much the HMP had learned from the initial couple of hundred people, we imagined how much more we could learn if we had data from 2,500, 25,000 250,000, or 2.5 million samples. Our mission is to advance the science of the microbiome and make it useful to people.

Over 250,000 samples, soon to be 1 million

Initially, uBiome collected 2,500 samples in a crowdfunding campaign. Now, six years later, uBiome has over 250,000 samples from all over the world, with over 1 million projected in 2019. Every day, we get thousands of samples from all over the world, process them in our CLIA-licensed and CAP-accredited laboratory, and analyze the results on our secure servers. While our initial 2013 dataset contained 2,500 samples and 500 million reads, our current collection of over 250,000 samples contains a staggering amount of about 500 billion DNA sequences. By the end of next year, we will have sequenced 200 billion microbes!

Explorer has been used to collect samples from 100 countries on all five continents (Figure 1). We also obtain many samples from our clinical products, SmartGut (stool) and SmartJane (vaginal). We use these samples to make our mission, of advancing the science of the microbiome and making it useful to people, a reality.

 

Academic partners, including UC, Harvard, Stanford, and more

In addition, uBiome has partnered with over 200 academic institutions and organizations, including The University of California, San Francisco (UCSF), Harvard, Stanford, MIT, Ohio State, UNC Chapel Hill, Yale University, University of Hong Kong, University of Singapore, and many others. These partnerships have resulted in a large amount of additional samples and datasets (Figure 1). Since many of these studies include well-defined interventions or populations, such as cancer patients or children with autism, these samples add important information to our growing database.  

uBiome also developed an Academic Grant Program, which we use to advance the science of the microbiome in partnership with institutions all over the world. To date we have supported and collaborated on almost 100 studies, representing thousands of samples. As it becomes more and more clear that the microbiome is intricately connected to human health and physiology, research interest in the field is increasing exponentially and spans many disciplines. We have worked with the Centers for Disease Control, National Institutes of Health, and universities in countries from Pakistan to Nigeria to Chile.

We continue to collaborate on projects that range from neurodegenerative diseases, to athletic performance, to oncology studies and have even developed custom kits and panels for new sample sites like ocular or synovial fluid!  

Our collection of a combined 250,000 samples is projected to be over 1 million by the end of next year (Figure 2). That makes uBiome’s dataset, to the best of our knowledge, the largest microbiome dataset in the world.

No study has ever been published based on a dataset this size. Two studies published in 2016 in Science reported the gut microbiome composition of two cohorts from Belgium and the Netherlands, of roughly 1,100 participants each. A recently published study reported data from nearly 13,000 gut samples from 11,000 participants. We have surpassed that by 25x.  

We leverage insights gained by building the largest human microbiome database in the world to make our consumer products better, to improve and launch clinical tests, and to expand into drug research and development using pre-existing patent assets and industry research collaborations. uBiome doesn’t sell access to its dataset.

So what have we learned?

To misquote Spider-Man, “With great datasets comes great responsibility”. It is uBiome’s mission to develop new tools for microbiome analysis, and use all these data to discover patterns that smaller datasets could not reveal. For example, we used a subset of sequence reads from self-declared healthy Explorer users to create and publish reference ranges for our clinical products SmartGut and SmartJane.

Using our dataset, we also are finding interesting differences between the microbiomes of healthy people and those who follow a certain diet or suffer from certain health conditions. Some of these confirm hypotheses proposed by other researchers, while other findings are completely novel.

Our microbiomes can be quite diverse and differ from another person’s based on many factors, including where we live, what we eat and drink, and our lifestyle. For example, the average user of our Explorer product is 42.8 years old; almost 70% of the users are between 25 – 54 years old. 47% of our samples come from female users, 45% coming from male users, and 8% come from people who identify with non-binary gender.

Some fun facts:

  • Due in large part to our expansive dataset, our data is also quite diverse! In total, we have detected 1,855 different genera in the combined five body sites; 1,619 of which were found in stool samples! This is more than the total 508 genera found in the context of the Human Microbiome Project.
  • Looking at the most abundant bacterial DNA sequences in our dataset – those bacterial DNA sequences that correspond to more than 1% of the total abundance of a sample and appear in at least two different individuals – we have detected more than 50,000 unique bacterial DNA sequences. Remarkably, only 17% of those sequences have ever been reported before by the scientific community. That adds up to 41,500 bacterial sequences that have not been seen before, yet they are common in human samples.
  • We detect species of Cyanobacteria (microorganisms that can feed themselves from sunlight, as plants do) in the microbiome; in skin samples, we also found several species of Archaea that are adapted to living in very salty or dry environments such as members of the Halobacteria class!
  • We found also microbes of unexpected sources in stool samples, such as Dickeya zeae, a bacterium causing stalk rot in maize, or Hymenobacter sedentarius, a bacterium isolated from soil.  
  • Skin microbiome samples have more genera in common with nose microbiota than with any other site. However, gut samples have more unique genera (i.e. genera not detected in any other site).

To the future!

What does this mean for microbiome research, as a whole? What does this mean for you?

A team of over 70 PhD scientists at uBiome is currently applying advanced computational techniques to analyze all this data — including machine learning, artificial intelligence, and statistical genetics. Each month, we generate terabytes of sequencing data, and users provide us with thousands of answers about their lifestyle, diet, and health. We use machine learning to find never-before reported connections between these factors and the composition of our microbiome. These new associations will provide us with a better understanding of the human microbiome and its role in our health.

As microbiome research continues to grow, uBiome is well placed to help advance the field with, perhaps, the most important piece: diverse and expansive information. And these new data boundaries are raising previously unthinkable connections between microbiome, lifestyle, and health. We are also applying new technologies focused on solving the unprecedented  bioinformatic and experimental challenges arising from this enormous effort, and improve the conversion of this data to knowledge and applications to use our microbiome to improve life, wellness, and health. We envision a world that is microbiome-aware, where chronic disease is better understood, and where we have the power to use information to better understand our health and our lives. Thank you all so much for joining us!

Figure 1: uBiome samples come from all over the world. The countries colored in orange show where we received samples from (100 countries!), while the blue dots show locations where uBiome has granted academic partnerships.

 

Figure 2. Growth of the uBiome dataset.