Predicting oral vs. gut microbiomes

Do the bacterial populations in your gut look very different from those in your mouth?

It turns out that yes, they’re so different, they leave a signature. And this signature actually allows us to predict what part of the body a sample comes from if we didn’t already know.

In the multidimensional scaling (PCoA) plots below, each dot represents a human microbiome sample sent in to uBiome. The first plot, with all blue dots, shows 1,000 gut and oral samples lumped together.

It’s a visualization that compresses 40 dimensions of information down to 2 dimensions. Each sample contains roughly 40 categories (phyla) of bacteria, so we compare how similar samples are to each other based on all these phyla, and then we represent these differences as a distance in space.

blueplot

Fig. 1: PCoA plot of 1,000 gut and oral samples submitted to uBiome

blueredplot

Fig. 2: PCoA plot of 1,000 gut and oral samples submitted to uBiome, with gut samples labeled blue and oral samples red

It’s striking how distinct the gut vs. oral signatures are at the phylum level of bacteria in the human microbiome.

Thanks to Dr. Siavosh Rezvan-Behbahani for analyzing uBiome data in this way. After he created the plots above, he decided to run a simple SVM (Support Vector Machine) algorithm on our dataset to see how well we can predict a gut vs. oral microbiome sample at the phylum level.

He found a relatively good AUC (Area Under Curve) of 0.9 for a Precision Recall curve he generated. He then put together a confusion matrix for a test set, as an intuitive way to understand how good our model is at predicting true positives and negatives.

Screen Shot 2015-03-10 at 4.29.33 PM

What other predictions would you lovely readers like us to be able to make from our dataset? Please let us know in the comments below.