Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to site content Skip directly to page options
CDC Home

Introduction to emm typing:
M protein gene (emm) typing Streptococcus pyogenes


Group A streptococcal mediated disease continues to be a major problem world-wide. There are millions of cases of GAS pharyngitis causing billions of dollars in medical expenses and work stoppage in the United States alone. Approximately 10,000 - 15,000 cases of invasive GAS disease occur annually in the United States, associated with a 10-13% mortality rate. Even though the Lancefield M protein serotyping system over the past 60 years has been very valuable, in recent years the inherent difficulties encountered in expanding this system through conventional serologic procedures have become increasingly evident. Using a less demanding sequence based system that is predictive of Lancefield M serotypes, we have extended the system established decades ago by Dr. Rebecca Lancefield.

M protein gene (emm) typing

Accumulated evidence indicates that the emm gene, defined here as the Streptococcus pyogenes gene amplified with primers 1 and 2, encodes the cell surface M virulence protein putatively responsible for at least 100 known M serospecificities of S. pyogenes. By using a system based on sequence analysis of the portion of the emm gene that encodes M serospecificity, the problems associated with M serotyping (limited availability of M typing sera, newly encountered M types, and difficulty in interpretation) are avoided. This system, called emm typing, relies upon the use of the two highly conserved primers to amplify a large portion of the emm gene. The hypervariable sequence encoding M serospecificity lies adjacent to one of the amplifying primer sequences, allowing for direct sequencing.

Download emm sequence databases via the Streptococci Group A Subtyping Request Form (Blast 2.0 Server)

  • Untrimmed emm sequences
  • Translations of untrimmed emm sequences
  • Subtype-determining region (bases 1-150 encoding processed M protein residues 1-50, being supplemented with sequences comprised of signal sequence region [bases 1-30 plus bases 1-150 encoding processed M protein residues 1-50])
  • Amino acids 1-50 of processed M protein

We are starting to add strain and other relevant information associated with each individual downloadable sequence. Bear with us until each is updated. Additional information is available for many types on the Browse Types page.

Top of Page

The emm sequence database

This database includes only sequences that have been checked by us for their accuracy. We have based our system on the strains originally used for characterization of the Lancefield serotypes.Some of the emm gene sequences in our database are also found in the GenBank. Due to time constraints, we do not routinely add these sequences to GenBank ourselves, so the majority of the entries in our database are not in GenBank. We do check the GenBank prior to the addition of each new sequence in order to avoid designation redundancy when possible (or at least to point it out). New emm gene sequence types and subtypes that we encounter in clinical isolates, and new types/subtypes submitted to us by outside researchers, are continually added to the CDC database. Many of these emm sequences are found in clinical isolates throughout the world. Each emm type sequence includes a 5 ' portion encoding about 15-23 residues of the membrane export signal sequence and 60 to 250 amino acids of the mature M protein N terminus. We would appreciate receiving (at email or regular address below) traces of new emm types and subtypes not found in our database. We will then verify the sequences and include them in this database, crediting the sender and institution with the sequence and information. We would also appreciate any information concerning these isolates (clinical specimen, geographical site, etc. ). We are in the process of including information with each individual downloadable emm subtype file.

Please address to:

Bernard Beall, Ph.D.
CDC & Prevention
1600 Clifton Rd. NE, Mailstop C02,
Atlanta, GA 30333

Top of Page

emm subtypes

Isolates with small alterations in the emm 5 ' terminus sometimes have altered serotypes relative to the emm /M serotype reference strain. Similarly, small sequence changes can possibly alter streptococcal susceptibility to type-specific opsonic antibodies elicited against the M protein. We are now including emm sequences into our database with any changes in the sequences encoding the first 50 residues of the mature M protein relative to the reference M type reference strain M protein. For example, emm68.1 contains a 7 codon deletion within the 5 ' 150 bases encoding the mature M protein relative to emm68. Although the M68.1 protein specifically cross reacts with M68 antisera, this protein is no longer serologically identical to M68 from the serotype M68 reference strain in gel diffusion tests. Additionally, small emm sequence alterations can be valuable in tracking specific strains. Common emm types are subdivided into stable subtypes on the basis of this 150 base type specific region. For example subtype emm3 .1 appears to account for the majority of type emm3 isolates in the United States (about 75-80%), while subtype emm3.4 accounts for about 20% of emm3 isolates. The nomenclature of this subtyping scheme is simple. Any variation within the 150 bases encoding the predicted 50 N-terminal M protein residues relative to the reference strain is assigned a subtype (e.g. emm3.1, emm3.2, emm6.1, emm6.2, emm12.1, emm12.2, etc. ) relative to the reference strain subtypes (always designated with .0; eg. emm3.0, emm6.0, emm12.0, etc. ). The SignalP World Wide Web server predicts the presence and location of signal peptide cleavage sites such that the first amino acid of the mature M protein can be determined.

emm sequences from beta hemolytic groups C, G, and L streptococci

Since we observe significant numbers of serious infections caused by group C and G streptococci, this database also includes emm gene sequences from groups G and C streptococcal isolates that we have encountered. We have also included the emm gene sequences from a few group L isolates.

sof genes from group A streptococci

Historically the anti-opacity factor (AOF) type, conferred by the sof (serum opacity factor gene) was equated with M serotype. We also try to provide information concerning sof gene sequences associated with many emm types. See Beall et al. 2000 emm and sof gene sequence variation in relation to serological typing of opacity factor positive group A streptococci. Microbiology.146: 1195-1209.

emm amplicon restriction analysis

We have found emm amplicon restriction digest analysis to be a valuable tool for rapid analysis of outbreak situations (see protocol for emm typing for procedure). To identify isolates containing identical or nearly-identical emm genes, and to avoid sequence analysis of an entire set of an identical emm type, the emm-specific amplicons of isolates related by T-type and opacity factor (OF) reactions can be subjected to restriction fragment profile analysis. A representative emm amplicon from a geographically and temporally related group of isolates (ideally with identical T agglutination pattern and OF reaction) displaying identical emm amplicon restriction profiles with Dde I and with Hinc II + Hae III double digests is subjected to variable region sequence analysis. Almost invariably, this sequence is highly conserved among the entire group of isolates. Amplicon restriction profiles are very useful since many emm types share a highly conserved profile. For example, more than 99% of emm1 and emm12 alleles from isolates that we have typed from varied locations worldwide are distinguished by single combinations of Dde I and Hinc II + Hae III restriction maps. However, not all emm types have a predominant restriction profile. This is due at least in part to variability in the locations and number of direct repeats contained in different emm genes. For example, we have found that among emm5, emm6, and emm92 alleles (just to name a few) there are several different restriction profiles. Nonetheless, even among these emm types, emm amplicon restriction patterning is a quick method to detect isolate sets that share highly conserved emm genes.

Top of Page

A Few Relevant References

  1. Lancefield, R.C. 1962. Current knowledge of the type specific M antigens of group A streptococci. J. Immunol. 89:307-313.
  2. Beachey, E.H., Seyer, E.M., Dale, J.B., Simpson, W.A., Kang, A.H. 1981. Type-specific protective immunity evoked by synthetic peptide of Streptococcus pyogenes. Nature. 292:457-9.
  3. Jones, K.F., Fischetti, V.A. 1988. The importance of the location of the antibody binding on the M6 protein for opsonization and phagocytosis of group A M6 streptococci. J. Exp. Med. 167:1114-1123.
  4. Whatmore AM, Kapur V, Sullivan DJ, Musser JM, Kehoe MA; Non-congruent relationships between variation in emm gene sequences and the population genetic structure of group A streptococci. Mol. Microbiol. 1994; 14: 619-631.
  5. Beall, B., Gherardi, G., Lovgren, M., Forwick, B., Facklam, R., and Tyrrell, G. 2000 emm and sof gene sequence variation in relation to serological typing of opacity factor positive group A streptococci. Microbiology.146: 1195-1209.

Top of Page


Images and logos on this website which are trademarked/copyrighted or used with permission of the trademark/copyright or logo holder are not in the public domain. These images and logos have been licensed for or used with permission in the materials provided on this website. The materials in the form presented on this website may be used without seeking further permission. Any other use of trademarked/copyrighted images or logos requires permission from the trademark/copyright holder...more

External Web Site Policy This graphic notice means that you are leaving an HHS Web site. For more information, please see the Exit Notification and Disclaimer policy.

Contact Us:
  • Centers for Disease Control and Prevention
    1600 Clifton Rd
    Atlanta, GA 30333
  • 800-CDC-INFO
    TTY: (888) 232-6348
    Contact CDC-INFO The U.S. Government's Official Web PortalDepartment of Health and Human Services
Centers for Disease Control and Prevention   1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO
A-Z Index
  1. A
  2. B
  3. C
  4. D
  5. E
  6. F
  7. G
  8. H
  9. I
  10. J
  11. K
  12. L
  13. M
  14. N
  15. O
  16. P
  17. Q
  18. R
  19. S
  20. T
  21. U
  22. V
  23. W
  24. X
  25. Y
  26. Z
  27. #