Whole Genome Sequencing (WGS)

What is whole genome sequencing (WGS)?

The genome, or genetic material, of an organism (bacteria, virus, potato, human) is made up of DNA. Each organism has a unique DNA sequence which is composed of bases (A, T, C, and G). If you know the sequence of the bases in an organism, you have identified its unique DNA fingerprint, or pattern. Determining the order of bases is called sequencing. Whole genome sequencing is a laboratory procedure that determines the order of bases in the genome of an organism in one process.

How does whole genome sequencing work?

Scientists conduct whole genome sequencing by following these four main steps:

  1. DNA shearing: Scientists begin by using molecular scissors to cut the DNA, which is composed of millions of bases: A’s, C’s, T’s and G’s, into pieces that are small enough for the sequencing machine to read.
  2. DNA bar-coding: Scientists add small pieces of DNA tags, or bar codes, to identify which piece of sheared DNA belongs to which bacteria. This is similar to how a bar code identifies a product at a grocery store.
  3. Whole genome sequencing: The bar-coded DNA from multiple bacteria are combined and put in the whole genome sequencer. The sequencer identifies the A’s, C’s, T’s, and G’s, or bases, that make up each bacterial sequence. The sequencer uses the bar code to keep track of which bases belong to which bacteria.
  4. Data analysis: Scientists use computer analysis tools to compare bacterial sequences and identify differences. The number of differences can tell the scientists how closely related the bacteria are, and how likely it is that they are part of the same outbreak.
EDLB laboratory photo of whole genome sequencing

How will whole genome sequencing transform disease detection?

Whole genome sequencing provides more detailed and precise data for identifying outbreaks than the current standard technique that PulseNet uses, pulsed-field gel electrophoresis (PFGE). Instead of only having the ability to compare bacterial genomes using 15-30 bands that appear in a PFGE pattern, we now have millions of bases to compare. That is like comparing all of the words in a book (WGS), instead of just the number of chapters (PFGE), to see if the books are the same or different. Using whole genome sequencing, we have found that some bacteria that appeared to be different using PFGE are actually from the same source. This has helped solve some outbreaks sooner.

Whole genome sequencing is a fast and affordable way to obtain high-level information about the bacteria using just one test. Currently, the process to fully characterize bacteria requires two or more scientists to perform four or more separate tests including PFGE. WGS will greatly improve the efficiency of how PulseNet conducts surveillance.

PulseNet is actively validating next-generation sequencing (NGS) technology as well as developing, evaluating, and implementing the tools needed to analyze the data.

Meeting the challenges of whole genome sequencing.

In 2013, CDC began using whole genome sequencing to detect outbreaks caused by the deadly bacteria Listeria. Since then, this method has allowed scientists to:

  • Detect more clusters of Listeria illnesses
  • Solve more Listeria outbreaks while they are still small
  • Link ill patients to likely food sources
  • Identify new food sources of Listeria, such as caramel apples and ice cream

Learn more about how the Listeria Whole Genome Sequencing Project has improved the detection and investigation of foodborne outbreaks. 

CDC is quickly expanding the use of whole genome sequencing in state laboratories, and scientists will soon begin using whole genome sequencing for outbreak investigations of other foodborne pathogens, such as Campylobacter, Shiga toxin-producing E. coli (STEC), and Salmonella. CDC’s Advanced Molecular Detection (AMD) initiative partially funded the expansion of real-time WGS for food safety.

Through collaboration with CDC’s AMD initiative and the food safety program, PulseNet is establishing the structure to support a switch to whole genome sequencing, including training of public health microbiologists to perform sequencing, purchasing sequencing supplies, and updating systems for data analysis. These activities are critical to launching whole genome sequencing in public health laboratories and improving surveillance for foodborne disease outbreaks and trends in foodborne infections and antibiotic resistance.

As the use of whole genome sequencing expands, CDC’s national surveillance systems and laboratory infrastructure must keep pace with the changing technology. With modernization, CDC and its public health partners can continue to successfully detect, respond, and stop infectious diseases. Together, we can ensure rapid and less costly diagnoses for individuals and the evidence needed to quickly solve and prevent foodborne outbreaks.