Whole Genome Sequencing
What is Whole Genome Sequencing (WGS) and what analyses are performed using WGS data?
Whole Genome Sequencing (WGS) generates DNA sequence data for the entire M. tuberculosis genome, which can be used for various applications. CDC began performing retrospective WGS for isolates in select genotype-matched clusters in 2012. In 2018, the National TB Molecular Surveillance Center was established to perform WGS prospectively on all new M. tuberculosis isolates.
CDC uses this WGS data for different types of analyses that serve different purposes:
Whole-genome single nucleotide polymorphism (wgSNP) comparison
wgSNP comparison is performed to identify single nucleotide polymorphisms (SNPs) that distinguish isolates in a genotype-matched cluster.
- SNPs result from mutations at a single position in the DNA sequence. Because SNPs gradually accumulate over time, the number of SNPs that differ between isolates (SNP distance) can provide information about whether the TB cases could be the result of recent transmission.
- The identified SNPs can also be mapped on to a phylogenetic tree to diagram the genetic relationship among isolates and the direction of genetic change. This information can be used to further ascertain the genetic similarity of isolates in a genotype-matched cluster and, when combined with available epidemiologic data, to help identify chains of TB transmission.
The purpose of this training video is to provide state and local tuberculosis (TB) control program staff with information from CDC’s Division of TB Elimination (DTBE) on the methods for performing whole-genome single nucleotide polymorphism (SNP) comparison and building phylogenetic trees. Details of each of the steps involved are presented, including reference-based assembly, identification of SNPs relative to the reference, filtering of SNPS, mapping the high-quality SNPs on to a phylogenetic tree, and determining placement of the most recent common ancestor. Considerations for how adding or removing isolates from the comparison affects results and a case study involving a patient with a mixed TB infection are presented.
Download Complete Slides of Tuberculosis whole-genome single nucleotide polymorphism (SNP) Training.pdf icon
Detection of possible drug resistance
CDC evaluates WGS data to detect mutations associated with drug resistance for surveillance purposes. CDC offers a Clinical Laboratory Improvement Amendments (CLIA)-compliant service, the Molecular Detection of Drug Resistance (MDDR) Service for rapid testing that provides a laboratory report to aid clinical management.
Whole-genome multilocus sequence typing (wgMLST)
wgMLST is a genotyping scheme that uses WGS data. The wgMLST scheme for TB includes 2,690 different genetic loci, each of which is an individual gene in the genome. Each of these 2,690 loci are analyzed and assigned a number such that isolates that have the same sequence at a locus will have the same number assigned for that locus. Isolates that match at ≥99.7% of the loci will form a genotype cluster, designated with a wgMLSType name. This new genotyping scheme will replace GENType and PCRType for defining and alerting TB clusters.
- Transition to whole-genome multilocus sequence typing (wgMLST) for TB cluster detection (Slide Set) pdf icon[PDF – 4 MB]
- Tuberculosis WGS Training Modulepdf icon
- wgSNP Methods Training Modulepdf icon
- Cornell University Molecular Epidemiology and Sequencing Approaches in Public Health Modulesexternal icon
- Illumina Introduction to Next-Generation Sequencingexternal icon