Guide to Metagenomic report
This page will provide some guidance to navigating the final Metagenomic report. This includes instructions for interacting with graphs, and context regarding the analysis.
Metagenomic pipeline overview
The metagenomic pipeline is managed using Snakemake. The general overview of the steps is described below:
- Read trimming and quality check. Fastp
- Taxonomic classification of V4 and/or ITS sequencing reads Kraken2
- Combination of classification results from V4 and ITS sequencing reads (only occurs when both V4 and ITS regions were sequenced. KrakenTools
- Estimation of species level abundance. Bracken
- Calculation of alpha (within-sample) diversity. KrakenTools-DiversityTools
- Calculation of beta (between-sample) diversity. KrakenTools-DiversityTools
Interacting with the report
The final data report is generated using MultiQC. This provides customizable and interactive report file to navigate your results. General details will be described below, but additional information can be found in MultiQC documentation.
Interact with plots
- Hovering over specific data within a plot will provide additional information regarding those results.
-
Clicking and dragging over a section of the graph will zoom into that area. Reset the zoom by clicking the top right.
-
Plots that have a gray bar along their base can be clicked to expand.
-
Plots can also be download as static images, alongside the data used to generate them.
Sequencing read processing
Sequencing reads from V4, and ITS targeted regions were processed using the same bioinformatic tools and parameters.
Fastp
Fastp provides comprehensive quality profiling for sequencing read data. In addition, it will filter out low quality reads, and trim unwanted sequences. The report includes the total number and relative percentages of individual sequencing reads that passed Fastp's filtering metrics. In most cases these values are expected to be greater than 90%. The report will allow you to select between displaying the total number of reads or percentages.
Kraken2
Kraken2 is a taxonomic sequence classififer that assigns phylogenic labels to sequencing reads. Krkane2 compares the query sequences to a database to identify reads down to a species level classification. The Standard Kraken2 database (includes archaea, viral, plasmid, human, and UniVec_Core references) supplemented with protozoa and fungi (plusPF). This database was last updated on 26-September-2022.
The report shows the top five taxonomic assignments across different phylogenic rankings (domain, phylum, class, order, family, genus, species). These different ranks can be navigated clicking the labelled buttons (see blue arrow). Sequence reads that successfully classified, but not within the top five values, are labelled as 'Other (Classified)' (see green arrow). Similar to the Fastp graph, these results can be displayed in total read number or percentages.
Bracken
Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomy labels assigned by Kraken2 to estimate the species abundance. Brcken will probabilistically re-distribute assigned reads through the taxonomic tree. This can include distributing reads to the species level, that were originally assigned to higher phylogenic rankings.
Sample Diversity
KrakenToolsDiversity is used to calculate alpha-diversity (within-sample) and beta-diversity (between samples) from the estimated abundance calculations performed by Bracken.
Alpha-diversity (within-sample)
Alpha-diversity is defined as the average diversity within a particular sample. There exist different algorithms for calculating this average diversity. The report will include two different statistical models; Shannon's diversity, and Simpson's index.
A greater Shannon's index value indicates higher diversity within the sample. The Simpson's index ranges from 0 (single species) to 1.0 (highest diversity).
Further details on the methodology behind these statistics can be found in the associated publications (found by clicking the previous names).
In addition to these values, the report includes the total number of species identified by Kraken2. The 'Species Kept' column (example below) refers to the number of species with a minimum of 10 reads used for by Bracken for species abundance calculations. The loss of identified species is usually due to a large number of closely related species with very small number of reads.
Beta-diversity (between-samples)
Beta-diversity is useful when trying to determine the changes in species composition between multiple samples. DiversityKrakenTools uses the Bray-Curtis dissimilarity matrix for this metric. A value of 0 means the samples are identical, while 1.0 represents maximal divergence.
The report includes these values for each pair-wise sample calculation plotted as a heatmap. Samples or replicates with lower diversity will be shown in blue, and high diversity comparisons in red.
Combined diversity from V4 and ITS targeted regions
If your project involves sequencing both the V4 and ITS regions, then your report includes the summed results of both analyses. KrakenTools is used to combine V4 and ITS taxonomic assignments into a single file. These results are then re-analyzed for species abundance estimation, alpha-diversity metrics, and beta-diversity comparison. This analytic pipeline is identical to the steps described above.
The side-bar to your report should look similar to below. The image has been editted to include colored bars dictating results specific to V4 sequencing (blue), ITS sequence (yellow), and the combined analysis (green).
Final results
You will be supplied instructions through the report on how to access your raw sequencing and result files.
General result files
The (sample_name).report.txt contains the taxonomic assignments from Kraken2.
Krona plots
An interactive multi-layered pie chart will be created for each sample by KronaTools. These plots group each domain classification by level within the pie chart, and allow navigation through the various taxonomic classifications within the sample.