Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

Introduction to emm typing: M protein gene (emm) typing Streptococcus pyogenes


Group A streptococcal mediated disease continues to be a major problem world-wide. There are millions of cases of GAS pharyngitis causing billions of dollars in medical expenses and work stoppage in the United States alone. Approximately 10,000 - 15,000 cases of invasive GAS disease occur annually in the United States, associated with a 10-13% mortality rate. Even though the Lancefield M protein serotyping system over the past 60 years has been very valuable, in recent years the inherent difficulties encountered in expanding this system through conventional serologic procedures have become increasingly evident. Using a less demanding sequence based system that is predictive of Lancefield M serotypes, we have extended the system established decades ago by Dr. Rebecca Lancefield.

M protein gene (emm) typing

Accumulated evidence indicates in many strains, the emm gene, defined here as the Streptococcus pyogenes gene amplified with primers 1 and 2, encodes the cell surface M virulence protein putatively responsible for at least 100 known M serospecificities of S. pyogenes. By using a system based on sequence analysis of the portion of the emm gene that encodes M serospecificity, the problems associated with M serotyping (limited availability of M typing sera, newly encountered M types, and difficulty in interpretation) are avoided. This system, called emm typing, relies upon the use of the two highly conserved primers to amplify a large portion of the emm gene. The hypervariable sequence encoding M serospecificity lies adjacent to one of the amplifying primer sequences, allowing for direct sequencing.

The S. pyogenes emm gene is generally associated between between the mga and scpA genes in 3 different arrangements (listed 1-3 above). Primers 1 and 2 used in the CDC emm typing protocol generally generate the “true” emm types shown in black font above, however in a minority of isolates, primers 1 and 2 generate the emm like gene sequences shown in red. The instances where multiple emm types have been deduced from whole genome sequence data in the Fittipaldi Laboratory are shown above (data taken from Athey TB, Teatero S, Li A, Marchand-Austin A, Beall BW, Fittipaldi N. J Clin Microbiol. 2014 Mar 19. [Epub ahead of print]. Deriving Group A Streptococcus Typing Information from Short-Read Whole Genome Sequencing Data. We will be adding additional corresponding data documenting additional instances such as the ones listed here.

Download emm sequence databases via the Streptococci Group A Subtyping Request Form (Blast 2.0 Server)

  • Untrimmed emm sequences
  • Translations of untrimmed emm sequences
  • Subtype-determining region (bases 1-150 encoding processed M protein residues 1-50, being supplemented with sequences comprised of signal sequence region [bases 1-30 plus bases 1-150 encoding processed M protein residues 1-50])
  • Amino acids 1-50 of processed M protein

 Top of Page

The emm sequence database

This database includes only sequences that have been checked by us for their accuracy. We have based our system on the strains originally used for characterization of the Lancefield serotypes.Some of the emm gene sequences in our database are also found in the GenBank. Due to time constraints, we do not routinely add these sequences to GenBank ourselves, so the majority of the entries in our database are not in GenBank. We do check the GenBank prior to the addition of each new sequence in order to avoid designation redundancy when possible (or at least to point it out). New emm gene sequence types and subtypes that we encounter in clinical isolates, and new types/subtypes submitted to us by outside researchers, are continually added to the CDC database. Many of these emm sequences are found in clinical isolates throughout the world. Each emm type sequence includes a 5 ' portion encoding about 15-23 residues of the membrane export signal sequence and 60 to 250 amino acids of the mature M protein N terminus. We would appreciate receiving (at email or regular address below) traces of new emm types and subtypes not found in our database. We will then verify the sequences and include them in this database, crediting the sender and institution with the sequence and information. We would also appreciate any information concerning these isolates (clinical specimen, geographical site, etc. ). We are in the process of including information with each individual downloadable emm subtype file.

Please address to:

Velusamy Srinivasan, Ph.D.
Streptococcus Laboratory
Centers for Disease Control and Prevention
1600 Clifton Rd., NE, MS-C02
Atlanta, GA 30333, USA
Phone: 404-639-0917
Fax: 404-639-2070

 Top of Page

emm subtypes

Isolates with small alterations in the emm 5 ' terminus sometimes have altered serotypes relative to the emm /M serotype reference strain. Similarly, small sequence changes can possibly alter streptococcal susceptibility to type-specific opsonic antibodies elicited against the M protein. We are now including emm sequences into our database with any changes in the 180 bp sequences encoding the C terminal 10 signal sequence residues and 50 N terminal residues of predicted mature protein.

emm sequences from beta hemolytic groups C, G, and L streptococci

Since we observe significant numbers of serious infections caused by group C and G streptococci, this database also includes emm gene sequences from groups G and C streptococcal isolates that we have encountered. We have also included the emm gene sequences from a few group L isolates.