Back to blog

Predicting oral vs. gut microbiomes

Do the bacterial populations in your gut look very different from those in your mouth?

It turns out that yes, they’re so different, they leave a signature. And this signature actually allows us to predict what part of the body a sample comes from if we didn’t already know.

In the multidimensional scaling (PCoA) plots below, each dot represents a human microbiome sample sent in to uBiome. The first plot, with all blue dots, shows 1,000 gut and oral samples lumped together.

It’s a visualization that compresses 40 dimensions of information down to 2 dimensions. Each sample contains roughly 40 categories (phyla) of bacteria, so we compare how similar samples are to each other based on all these phyla, and then we represent these differences as a distance in space.

blueplot

Fig. 1: PCoA plot of 1,000 gut and oral samples submitted to uBiome

blueredplot

Fig. 2: PCoA plot of 1,000 gut and oral samples submitted to uBiome, with gut samples labeled blue and oral samples red

It’s striking how distinct the gut vs. oral signatures are at the phylum level of bacteria in the human microbiome.

Thanks to Dr. Siavosh Rezvan-Behbahani for analyzing uBiome data in this way. After he created the plots above, he decided to run a simple SVM (Support Vector Machine) algorithm on our dataset to see how well we can predict a gut vs. oral microbiome sample at the phylum level.

He found a relatively good AUC (Area Under Curve) of 0.9 for a Precision Recall curve he generated. He then put together a confusion matrix for a test set, as an intuitive way to understand how good our model is at predicting true positives and negatives.

Screen Shot 2015-03-10 at 4.29.33 PM

What other predictions would you lovely readers like us to be able to make from our dataset? Please let us know in the comments below.

 

6 Thoughts on “Predicting oral vs. gut microbiomes”

  • What kind of cross-validation did you do. Were the samples from the same person in both the training and testing sets? If so, you could be overfitting to persons’ microbiomes. Still, the data look like they separate well.

    If you’re going to use machine learning to predict things from microbiome data, you should use this unique opportunity where you have continuously incoming data to perform strong cross-validation by making actual predictions on new data.

  • Bob West says:

    Sorry, is it possible to label the axes on these plots? Or maybe a link to a prior post where this is discussed? I suppose I should already know what they refer to, but I just began following this blog. Apologies, but thanks!

  • […] in the news Predicting oral vs. gut microbiomes – Alexandra Carmichael – […]

  • Alex says:

    You should make a portion of the anonymized data available as a training set and have a contest to see who can make the best classifier, with assessment based on a separate test set of samples which are not available to participants.

    I bet I can beat your current model.

  • Comments are closed.