Reported Tuberculosis in the United States, 2021
Appendix C
Genotyping Background Information and Glossary
Tuberculosis (TB) genotyping methods are laboratory-based analyses of the genetic material of the bacteria that cause TB disease, Mycobacterium tuberculosis complex. The total genetic content is referred to as the genome. Specific sections of the genome contain distinct genetic patterns that help distinguish different strains of Mycobacterium tuberculosis. TB genotyping examines the location, number, and presence of different types of spacer or repetitive DNA patterns. The areas of the genome examined in TB genotyping are different from those related to drug resistance. TB genotyping also uses whole-genome sequencing (WGS) to examine 70%–90% of the genetic sequence of the bacterial genome.
Applications of Genotyping
Patients with TB disease who are related by recent transmission should generally have matching genotype results. Conversely, patients with matching TB genotyping results are probably related by transmission in some way, although the connection might not be recent or direct.
Genotyping results, when combined with epidemiologic data, can help identify persons with TB disease involved in the same chain of transmission. This information adds value to conventional TB control activities in different ways. These applications are summarized as follows:
Patient-Level Applications of Genotyping
Complete Contact Investigations
Connections identified between ≥2 patients with TB (i.e., epidemiologic linkages) that might or might not be otherwise identified through routine contact investigations should be confirmed or refuted using available genotyping results.
Cluster Investigations
Connections suggested by genotyping results that were not established through routine contact investigations should be identified.
Other patient-level applications include detecting and investigating potential false-positive culture results and distinguishing relapse TB disease from new TB infection (i.e., reinfection) among TB patients with recurrent TB disease.
Population-Level Applications of Genotyping
Outbreak detection, assessment, and monitoring
- Identify genotype clusters that could represent an outbreak using geospatial or other analyses.
- Assess the genetic relatedness of isolates among cases believed to be part of an outbreak.
- Define the scope of potential outbreaks by identifying all presumed cases in an area with a matching genotype.
- Prospectively monitor known outbreaks for new cases with the same outbreak-related genotype.
History of TB Genotyping Surveillance in the United States
In 1996, the Centers for Disease Control and Prevention (CDC) started the National Tuberculosis Genotyping Surveillance Network (NTGSN), a 5-year initiative that established the utility of genotyping in TB control efforts.1 In 2004, based on the knowledge gained from NTGSN and associated studies,2 CDC established the National Tuberculosis Genotyping Service (NTGS) and funded a national genotyping laboratory to genotype ≥1 Mycobacterium tuberculosis isolate from each culture-positive TB case reported in the United States.3 All U.S. TB control programs can use NTGS at no cost to patients, health care providers, or health departments. NTGS participation is voluntary, with each program determining how genotyping data will be used for its TB control activities. Since 2004, approximately 155,000 Mycobacterium tuberculosis isolates have been successfully genotyped through NTGS and its partnerships among CDC programs, national genotyping laboratories, and state and local jurisdictions.
In 2010, CDC launched the Tuberculosis Genotyping Information Management System (TB GIMS),4 a secure Internet-based database available to reporting areas in all 50 U.S. states, the District of Columbia, Puerto Rico, the U.S. Virgin Islands, and the U.S.-affiliated Pacific Islands. TB GIMS makes genotyping data easily available to users and facilitates linking of genotyping data to patient surveillance records from the National TB Surveillance System. Key features include database queries of genotypes and clusters, data quality checks, aggregate reports, maps, and outbreak detection tools. As of July 2022, TB GIMS has 567 users among state, tribal, local, territorial, and federal partners.
In 2018, CDC established the National Tuberculosis Molecular Surveillance Center (NTMSC) to perform whole-genome sequencing on ≥1 isolate from every culture-positive TB case in the United States. In July 2022, NTMSC fully transitioned from conventional genotyping, e.g., GENType, to WGS and the resulting whole-genome multilocus sequence type (i.e., wgMLSType). However, existing data from conventional genotyping will be available in TB GIMS for jurisdictions’ reference in their TB prevention and control activities.
Genotyping-Based Cluster Detection
CDC identifies genotype-matched clusters, which can represent TB outbreaks, using geospatial analysis to identify unexpected clustering of TB cases within a defined time period. TB control programs can use this cluster detection information to help allocate and prioritize resources for investigation and intervention for specific cases that might be caused by recent transmission.
CDC’s primary outbreak detection method is based on identifying higher than expected geospatial concentrations of a TB genotype in a specific county, compared with the national distribution of that genotype. This method calculates a log-likelihood ratio (LLR) statistic; clusters with higher LLRs represent greater geospatial concentrations than clusters with lower LLRs; higher LLRs might indicate recent transmission of TB. TB GIMS classifies LLRs into alert levels based on established cut points. Clusters are classified as no alert (LLRs 0–<5), medium alert (LLRs 5–<10), or high alert (≥10). The alert level and changes in alert levels (e.g., from no alert to medium- or high-level alerts) can help TB programs identify outbreaks by prioritizing TB genotype clusters for further investigation and possible intervention. CDC assesses characteristics of alerted clusters and communicates findings for priority clusters back to TB programs.
Genotyping Terminology
In NTGS, a genotype is defined as a unique combination of spacer oligonucleotide typing (spoligotype) and 24-locus mycobacterial interspersed repetitive unit–variable number tandem repeat typing (MIRU–VNTR) results. Each unique combination of results is assigned a GENType designated as G followed by 5 digits, which are assigned sequentially to every genotype identified in the United States (e.g., G00162). This nomenclature is designed for convenience and ease of communication, but the specific numbers assigned have no additional importance outside NTGS. Genotyping data from NTGS should not be used for clinical decision making.
National TB Genotyping Surveillance Coverage in the United States
National TB genotyping surveillance coverage refers to the percentage of culture-positive TB cases with a genotyped Mycobacterium tuberculosis isolate. High levels of coverage in the United States can provide a better understanding of the molecular epidemiology of TB transmission within a specific geographic area and nationally. Additionally, because outbreak detection algorithms are based on identifying unexpected geospatial concentrations of cases whose isolates have the same genotype, high coverage levels help decrease the likelihood of false-negative alerts. The National Tuberculosis Indicators Project genotyping surveillance coverage target is 100% for 2020.5
Glossary
alert level: the alert level is determined by the log-likelihood ratio (LLR) statistic for a given cluster. This is calculated by the Tuberculosis Genotyping Information Management System (TB GIMS) and is updated whenever a new case is added to a genotype cluster. E-mail notifications are sent to TB GIMS users whenever an alert level for a given GENType cluster changes from a no alert LLR (0–<5) to medium LLR (5–<10) or high LLR (≥10), or from a medium LLR to a high LLR alert level.
cluster investigation: a cluster investigation seeks to identify epidemiologic links among TB patients whose isolates have matching genotypes. It might include reviewing information from public health and medical records or interviewing case managers and outreach workers. It can also involve re-interviewing TB patients.
epidemiologic (epi) link: an epidemiologic link indicates a relationship that two TB patients share that might explain where, when, and how Mycobacterium tuberculosis was transmitted between them. Patients who name each other as contacts have an epidemiologic link. However, an epidemiologic link can also be a location where the two patients spent time together or an activity that brought them together.
genotype: a genotype is the strain discrimination produced by conventional genotyping techniques used for Mycobacterium tuberculosis, including spacer oligonucleotide typing (spoligotyping) and 24-locus mycobacterial interspersed repetitive unit–variable number tandem repeat typing analysis (MIRU-VNTR). These designations were developed to facilitate communication of genotyping information among TB programs.
genotype surveillance coverage: genotyping surveillance coverage is defined as the percentage of culture-positive TB cases with a genotype result.
genotyping cluster: a cluster consisting of ≥2 cases in a jurisdiction during a specified period with Mycobacterium tuberculosis isolates that share matching genotypes. In the United States, all cases with matching GENType are considered to be in a genotype cluster. The jurisdiction and period used to define clusters vary depending on the specific application. Within TB GIMS, a single county and a 3-year period are typically used to define a cluster.
GENType: a U.S. molecular surveillance designation for each unique combination of spoligotyping and 24-locus MIRU-VNTR analysis results. GENType is designated as a G followed by 5 digits, which are assigned sequentially to every genotype identified in the United States (e.g., G00017).
geospatial concentration: a measure of how concentrated a genotype is in time and space. It indicates that recent transmission might have occurred, because patients with isolates having the same genotype and who reside closer to each other are more likely to have come in contact with each other. TB GIMS uses the log-likelihood ratio (LLR) to generate a statistical measure of geospatial concentration of a given TB genotype for purposes of cluster detection and alerting.
linking: the process of connecting genotyping results in TB GIMS with a corresponding TB case reported to NTSS. This process is essential for ensuring that demographic, risk factor, and geographic data can be viewed in TB GIMS for genotype clusters.
log-likelihood ratio (LLR): a measure of the geographic concentration of a specific genotype in a county, compared with the national distribution of that same genotype, throughout a 3-year period. A higher LLR indicates that the genotype has geospatial clustering within the county, which might indicate recent transmission of Mycobacterium tuberculosis.
mycobacterial interspersed repetitive unit–variable number tandem repeat (MIRU-VTNR): a polymerase chain reaction (PCR)-based assay used for genotyping. National Tuberculosis Genotyping Service (NTGS) performed 24-locus MIRU-VNTR analysis on every isolate submitted for genotyping until 2018; the National Tuberculosis Molecular Surveillance Center succeeded NTGS and continued MIRU-VNTR until July 2022. Before 2009, 12-locus MIRU-VNTR was performed. MIRU-VTNR distinguishes Mycobacterium tuberculosis strains by the difference in the number of copies of tandem repeats at specific regions, or loci, of the Mycobacterium tuberculosis genome.
Mycobacterium bovis: a member of the Mycobacterium tuberculosis complex that is commonly associated with cattle and deer. In the United States, human cases of M. bovis TB typically have a foodborne origin (e.g., consumption of unpasteurized milk or dairy products). M. bovis is intrinsically resistant to pyrazinamide. Detection of TB caused by M. bovis can be performed through genotyping; however, this information should not be relied on for clinical decision making.
Mycobacterium tuberculosis complex (MTBC): A group of closely related mycobacterial species that can cause latent TB infection (LTBI) and TB disease (i.e., Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium bovis bacillus Calmette-Guérin, Mycobacterium africanum, Mycobacterium canetti, Mycobacterium caprae, Mycobacterium microti, Mycobacterium pinnipedii, and Mycobacterium mungi). Among humans, most TB cases are caused by M. tuberculosis.
National Tuberculosis Genotyping Service (NTGS): NTGS provided TB genotyping services to local and state TB control programs until it was succeeded by the National Tuberculosis Molecular Surveillance Center in 2018. Since 2004, genotyping services have been provided at no cost to patients, health care providers, and health departments.
National Tuberculosis Molecular Surveillance Center (NTMSC):NTMSC was established in 2018 to perform whole-genome sequencing on ≥1 isolate from each newly diagnosed culture-positive TB case in the United States. NTMSC also provided MIRU-VNTR data on each isolate until July 2022.
National Tuberculosis Surveillance System (NTSS): NTSS is administered by the Centers for Disease Control and Prevention (CDC) and collects surveillance data through an electronic reporting registry of TB cases. Data collected include demographic, clinical, and social risk factor variables that are reported to CDC by state and local health departments.
PCRType: a U.S. molecular surveillance designation for each unique combination of spoligotyping and 12-locus MIRU-VNTR. PCRType is designated as PCR followed by 5 digits, which are assigned sequentially to every genotype identified in the United States (e.g., PCR01974).
polymerase chain reaction (PCR): a laboratory method that can rapidly amplify limited quantities of specific DNA, thereby enabling certain types of laboratory testing. Until July 2022, CDC’s genotyping services routinely used two PCR-based techniques, spoligotyping and MIRU-VNTR.
reinfection: disease caused by a second infection, often with a strain (genotype) of Mycobacterium tuberculosis different from the strain that caused the initial infection. (See also relapse.)
relapse: represents a clinical worsening of TB disease after treatment and a presumed period of improvement. It is caused by the same strain (genotype) of Mycobacterium tuberculosis. Genotyping the initial and the subsequent Mycobacterium tuberculosis isolates might help distinguish relapse from reinfection. (See also reinfection.)
Report of a Verified Case of Tuberculosis (RVCT): national case surveillance data regarding patients with TB disease, recorded on the standardized RVCT form and subsequently reported to the Centers for Disease Control and Prevention’s National Tuberculosis Surveillance System.
spacer oligonucleotide typing (spoligotyping): spoligotyping is based on spacer sequences located in the direct repeat region in the chromosomes of the Mycobacterium tuberculosis complex. The spoligotype uses an octal code to report results as a 15-digit number. Spoligotype can be derived from a polymerase chain reaction (PCR)-based assay or WGS. Spoligotyping was performed on every isolate submitted for genotyping until July 2022.
Whole Genome Sequencing (WGS): Whole Genome Sequencing generates DNA sequence data for approximately 90% of the M. tuberculosis genome.
Whole-Genome Multilocus Sequence Typing (wgMLST): A genotyping scheme that uses WGS data to compare sequences at 2,690 loci (about 70% of the MTBC genome). A locus is the location in the genome (i.e., one gene).
wgMLSType: a U.S. molecular surveillance designation for each whole-genome sequence analysis result from wgMLST. Isolates with wgMLST results that are clustered, defined as matching at >99.7% of their loci, are assigned a wgMLSType designated by MTBC followed by 6-digits (e.g., MTBC000123). Otherwise, the isolate is designated as “MTBCunique.”
1Cowan LS, Crawford JT. Genotype analysis of Mycobacterium tuberculosis isolates from a sentinel surveillance population. Emerg Infect Dis. 2002;8(11):1294-1302. Genotype analysis of Mycobacterium tuberculosis isolates from a sentinel surveillance population – PubMed (nih.gov)
2Haddad MB, Diem MA, Cowan LS, et al. Tuberculosis genotyping in six low-incidence states, 2000–2003. Am J Prev Med. 2007;32(3):239-243. Tuberculosis genotyping in six low-incidence States, 2000-2003 – PubMed (nih.gov)
3Ghosh S, Moonan PK, Cowan L, Grant J, Kammerer S, Navin TR. Tuberculosis Genotyping Information Management System: enhancing tuberculosis surveillance in the United States. Infect Genet Evol. 2012;12(4):782-788. Tuberculosis genotyping information management system: enhancing tuberculosis surveillance in the United States – PubMed (nih.gov)
4Centers for Disease Control and Prevention. Tuberculosis Genotyping Information Management System. Atlanta, GA: U.S. Department of Health and Human Services, CDC; [undated]. https://www.cdc.gov/tb/programs/genotyping/tbgims/default.htm
5Centers for Disease Control and Prevention. Monitoring tuberculosis programs—National Tuberculosis Indicator Project, United States, 2002–2008. MMWR Morb Mortal Wkly Rep. 2010;59(10):295-298. https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5910a3.htm