
Here we will only be interested in the identification and quantification of individual taxa (or species) through a ‘fingerprint gene’ called 16s rRNA which is present in all bacteria. Shotgun sequencing of all bacteria in a sample delivers knowledge of all the genes present. Bacteria can now be identified through the use of next generation sequencing applied at several levels. The microbiome is formed of the ecological communities of microorganisms that dominate the living world. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.

We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2, structSSI and vegan to filter, visualize and test microbiome data. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. In this paper, we show that statistical models allow more accurate abundance estimates. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions.

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes.
