Chapter 1 – Human genome epidemiology: the road map revisited

On This Page

Human Genome Epidemiology (2nd ed.): Building the evidence for using genetic information to improve health and prevent disease

“The findings and conclusions in this book are those of the author(s) and do not
necessarily represent the views of the funding agency.”

These chapters were published with modifications by Oxford University Press (2010)


Muin J. Khoury, Sara R. Bedrosian, Marta Gwinn, Julian Little, Julian P. T. Higgins, and John P. A. Ioannidis

In 2004, we published the book entitled Human Genome Epidemiology: A Scientific Foundation for Using Genetic Information to Improve Health and Prevent Disease (1). In it, we discussed how the epidemiologic approach provides an important scientific foundation for studying the continuum from gene discovery to the development, applications, and evaluation of human genome information in improving health and preventing disease. We called this continuum human genome epidemiology (or HuGE) to denote an evolving field of inquiry that uses epidemiologic applications to assess the population impact of human genetic variation on health and disease, and how the resulting information can be used to improve population health. We discussed and gave examples that illustrated that after the discovery of genetic variants associated with diseases, additional well-conducted epidemiologic studies are needed to characterize the population impact of gene variants on the risk for adverse health outcomes and to identify and measure the impact of modifiable risk factors that interact with gene variants. Epidemiologic studies are also required for evaluating clinical validity and utility of new genetic tests, to monitor population use of genetic tests and to determine the impact of genetic information on the health and well-being of different populations. The results of such studies will help medical and public health professionals integrate human genomics into practice.

The Rationale for a Second Edition of Human Genome Epidemiology

Since 2004, advances in human genomics have continued to occur at a breathtaking pace. Although the concept of personalized healthcare and disease prevention often promised by enthusiastic scientists and the media is yet to be fulfilled, we are now seeing rapid progress and accumulation of data in many “omics” related research fields such as transcriptomics, proteomics, and metabolomics (2). Results of the International HapMap project were published in 2005 (3), paving the way to more efficient methods to discover human genetic variations associated with a variety of common diseases of public health significance. New methods to measure genome variation on an unprecedented large scale (hundreds of thousands of genetic variants) have propelled a new generation of genome association studies (4). Evaluation of rare variants and full sequencing at large-scale are rapidly becoming a reality. Also, we have seen the emergence of population-based biobanks in many countries with the objectives of quantifying longitudinally the joint influences of genetic and environmental factors on the occurrence of common diseases (5).

Perhaps the single most important development in human genome epidemiology has been the emergence of genome-wide association studies (GWAS; 6). The continuous improvements in genome-wide analysis technologies, coupled with drastic reductions in price, have led to widespread applications of these technologies in large collaborative case-control, cross-sectional, and cohort studies. These studies have interrogated agnostically, without a priori hypotheses, variation in the whole genome, looking for differences in the distribution of genetic polymorphisms between individuals with and without disease. As of August 2009, more than 400 gene variants have been discovered and replicated as risk markers (but not necessarily true culprits) for a variety of common diseases of public health significance (7). As a result, we are seeing an unprecedented expansion in the number of publications of GWAS as well as studies of candidate genes with varying methodological quality. While the deposition of GWAS data in potentially accessible databases (8,9) could lead to avoidance of selective publication, protection from other biases (e.g., selection, confounding, misclassification) is still a real concern even with large GWA studies that are based on selected or noncomparable samples of cases and controls. In addition, new technology such as full genomic sequencing is likely to replace the current genome-wide SNP analysis platforms. Furthermore, we are seeing the emergence of the novel approaches of system biology, as well as the development of biomarkers based on gene expression profiles, epigenetic patterns, proteomic profiles, and so on. Each new development taxes our ability to make sense of the ever-increasing amount of data. We must continue to develop, apply, and sharpen our epidemiological approaches to study designs, analysis, interpretation, and knowledge synthesis.

From Gene Discovery to Clinical and Public Health Applications

The ongoing success of GWAS in uncovering genetic risk markers for many common diseases has renewed expectations of a new era of health care and public health practice (6,10,11). Already, we have a few examples of applications in clinical medicine and population health (see Table 1.1 for emerging examples). By and large, emerging applications are relatively rare in spite of the rapid advances in gene discovery, and for many of them, their benefits and cost-effectiveness are not well known. Therefore, there is an urgent need to understand the benefits and harms and to ensure high-quality implementation of new technologies (12). This includes improving the evidence base of outcomes of these technologies; the development of evidence-based guidelines for the use of genomic applications (13); the use of policy and legislation to prevent discrimination on the basis of genetic information (14); and the effective engagement of providers, researchers, and the general public. More recently, “direct to consumer” (DTC) offerings of genome-wide profiles have been developed and marketed by several companies, with the implicit, if not explicit, goal of providing information for improving individual health and preventing common diseases (15). The ready availability and complexity of these new DTC tests could strain the ability of consumers and the health care delivery system to determine the true value of applying extensive quantities of genomic data to health management. Proponents of DTC genome-wide profiles feel strongly that this approach can empower and educate individuals about disease prevention and health promotion. Others are concerned that the use of genome-wide profiles is based on an incomplete knowledge about the relationship between genetic variations and human diseases, and the lack of a full understanding of the optimal specific medical or lifestyle interventions that should be offered based on these test results (16). Questions also remain regarding the scope of individual genetic tests that should be included in genomic profiles, whether the underlying technologies are robust, and where the balance lies between potential benefits and harms (clinical utility) of these tests to individuals and populations (16,17). A 2007 report found several limitations in the existing US-based research and healthcare delivery infrastructure to create an evidence base of utilization and outcomes of gene-based applications (18). In addition, providers and the public have little understanding of genomics and genomics services (10). Overcoming these limitations would require coordinating efforts that span multiple disciplines of laboratory sciences, medicine and public health, including health services research, and outcomes research. The epidemiologic approach is at the intersection of all these disciplines.

The Emergence of Public Health Genomics

In the face of evolving technologies, we have witnessed in the past few years the emergence of “public health genomics,” a multidisciplinary field concerned with the effective and responsible translation of genome-based knowledge and technologies to improve population health. This field is thriving in many countries and uses epidemiologic methods as a foundation for knowledge integration of genetic information in medicine and public health (1921). Public health genomics uses population-based data on genetic variation and gene-environment interactions to develop, implement, and evaluate evidence-based tools for improving health and preventing disease. Public health genomics also applies systematic, evidence-based assessments of genomic applications in health practice and works to ensure the delivery of validated, useful genomic tools in practice. Even with impressive advances in the basic sciences of gene discovery and characterization, reservations have been voiced about the potential benefits of medical applications of genomics; these reservations are based in part on the complex relationship between genetic variation and the environment with disease occurrence, as reflected in the modest associations between individual gene variants and disease outcomes, and the limited clinical validity and utility of using genetic information in the prediction of disease. Moreover, prematurely optimistic claims by researchers, the media, test developers, and commercial genomic enterprises may lead to unrealistic expectations among consumers and inappropriate use of genetic information. Also, an overemphasis on the genetics of human disease may divert attention from the importance of environmental exposures, social structure, and lifestyle factors (22). In public health practice, skepticism about genomics runs high among some practitioners whose traditional domains are the control of infectious diseases, environmental exposures, and health promotion for chronic disease prevention. To some, genomics research is perceived as a low-yield investment, as well as an opportunity cost, undercutting social efforts to address environmental causes of ill health. To others, public health applications of genomics are viewed only in terms of population screening, remaining limited to newborn screening programs (23). Still others reject genomics research as an unwarranted extension of the individual risk paradigm (24), citing the distinction between prevention in populations and in high-risk persons set out by Geoffrey Rose in 1985 (25). However, Rose was careful to present these approaches as complementary rather than mutually exclusive (25).

It can be argued that the integration of genomics into healthcare and disease prevention requires a strong medicine–public health partnership (26). Public health and health care often operate in different spheres, although medicine is part of the “public health system” (27). This “schism” can be overcome in genomics using a population approach to a joint translational agenda that includes (a) a focus on prevention, a traditional public health concern that is now a promise of genomics in the realm of personalized medicine; (b) a population perspective that requires a large amount of population level data to validate gene discoveries for clinical and population-level applications, especially given the modest associations between genetic factors and disease burden; (c) commitments to evidence-based knowledge synthesis and guideline development, especially with thousands of potential genomic applications emerging into practice; and (d) emphasis on health services research and the surveillance of population health to evaluate health outcomes, costs, and benefits in the “real world” (27).

Epidemiology and the Phases of Genomics Translation

As shown in Table 1.2, there are four phases of translation research in genomics, from gene discovery to population health impact (28). In addition to traditional genetic epidemiology, which has focused by and large on gene discovery, epidemiologic methods and approaches play a role in all four phases (see Table 1.2). Phase 1 (T1) research seeks to move a basic genome-based discovery into a candidate health application (e.g., genetic test/intervention). Phase 2 (T2) research assesses the value of a genomic application for health practice leading to the development of evidence-based guidelines. Phase 3 (T3) research attempts to move evidence-based guidelines into health practice, through delivery, dissemination, and diffusion research. Phase 4 (T4) research seeks to evaluate the “real world” health outcomes of a genomic application in practice. Because the development of evidence-based guidelines is a moving target, the types of translation research can overlap and provide feedback loops to allow integration of new knowledge. Although it is difficult to quantify how much of human genomics research is T1, we have estimated that no more than 3% of published research focuses on T2 and beyond (28). Indeed, evidence-based guidelines and T3 and T4 research currently are rare (except in newborn screening, and selected testing for genetic disorders such as hereditary breast and ovarian cancer).

The Continued Need for Methodological Standards in Human Genome Epidemiology

Thus, the need for making sense of the avalanche of genetic and genomic data is more urgent than ever. This urgency is behind the continued growth of the Human Genome Epidemiology Network (HuGENet), a global collaboration of individuals and organizations who are interested in accelerating the development of the knowledge base on human genetic variation and population health and the use of this information in improving health and preventing disease (29). HuGENet has focused on developing methods and guidance to integrate and disseminate a global knowledge base on assessing the prevalence of genetic variants in different populations, genotype-disease associations, and gene-gene and gene-environment interactions, and evaluating genetic tests for screening and prevention. During the past three years, HuGENet has made many methodological and substantive contributions to the field. HuGENet has developed a Web-based searchable knowledge base (the HuGE Navigator) that captures ongoing publications in human genome epidemiology (30). The HuGE Navigator is searchable by disease, gene, and disease risk factors. Furthermore, in collaboration with several journals, HuGENet has sponsored the systematic reviews of the evidence on genotype-disease associations, using specific published guidelines and recommendations—the HuGENet handbook (31)— for carrying out this work, as well as for applying quantitative methods of synthesis. Since 2000, HuGENet collaborators have carried out more than 80 reviews on various diseases ranging from single gene conditions to common complex diseases. In 2005, HuGENet formed a network of investigator networks (32), which currently has 35 consortia, mostly disease-specific networks that are represented by hundreds of collaborators interested in sharing knowledge, experience, and resources in the conduct, analysis, and dissemination of results of human genome epidemiology investigations. In 2006, HuGENet conducted a workshop in collaboration with the global movement STROBE (STrengthening the Reporting of OBservational Epidemiology) to extend the now well-studied “STROBE reporting checklist” to include genetic associations, under the rubric of STREGA (STrengthening the REporting of Genetic Associations; 33). In addition, the HuGENet “network of networks” published a “road map” for using consortia-driven pooled meta- analyses to accelerate the knowledge base on gene-disease associations (34). With the publication of the HuGENet roadmap, the editors of Nature Genetics called for the development and online publication of peer reviewed, curated expert knowledge bases called “field synopses” that are regularly updated and freely accessible (35). HuGENet implemented the field synopsis concept in a meeting held in 2006 in Venice (36). The workshop participants generated interim guidelines for grading the cumulative evidence in genetic associations based on three criteria: (1) the amount of evidence; (2) the extent of replication; and (3) protection from bias. The proposed scheme allows for three categories of descending credibility for each of these criteria and also for a composite assessment of “strong,” “moderate,” or “weak” credibility (36). In 2008, HuGENet collaborators conducted a workshop to discuss insights and experiences from several field synopses that represented the first efforts by multiple authors at grading the credibility of these associations on a massive scale. HuGENet participants emerged with a vision for collaboration that builds a reliable cumulative evidence for genetic associations and a transparent, distributed, and authoritative knowledge base on genetic variation and human health (37).

The HuGE Roadmap Revisited

With all these ongoing developments, we have invited many authors who are leaders in the field to produce the second edition of Human Genome Epidemiology. Our aim is to inform readers of new developments in the genomics field and how epidemiologic methods are being used to make sense of this information. We do realize that the material presented in this book will be outdated even before it is published. However, the methodological challenges and possible solutions to them will remain with us for quite some time. There is very little material remaining from the first edition of Human Genome Epidemiology.

This new edition is divided into five parts. In Part I, we give an overview of the development and progress in applications of genomic technologies, with a focus on genomic sequence variation (Chapter 2). We then give an overview of the multidisciplinary field of public health genomics that includes a fundamental role of epidemiologic methods and approaches (Chapter 3). We also present a brief overview of evolving methods for tracking and compiling information on genetic factors in disease (Chapter 4).

In Part II, we discuss methodological developments in collection, analysis, and synthesis of data from human genome epidemiologic studies. We discuss the emergence of biobanks around the world (Chapter 5), the evolution of case-control studies and cohort studies in the era of GWAS (Chapter 6), and the emerging role of consortia and networks (Chapter 7). Next, we discuss methodological analytic issues in GWAS (Chapter 8) and the analytic challenges of gene-gene and gene-environment interaction (Chapter 9). We then address issues of reporting of genetic associations (Chapter 10), evolving methods for integrating the evidence (Chapter 11), and assessment of cumulative evidence and field synopses (Chapter 12).

In Part III, we provide several case studies related to various diseases that attempt to present an evolving knowledge base of the cumulative evidence on genetic variation in a variety of human diseases. As the information undoubtedly will change (even before the publication of the book), we stress here the importance of strong methodological foundation for analysis and synthesis of information from various studies. The diseases shown in this section include three cancers: colorectal cancer (Chapter 13), childhood leukemia (Chapter 14), and bladder cancer (Chapter 15). We also present data from type 2 diabetes (Chapter 16), osteoporosis (Chapter 17), preterm birth (Chapter 18), coronary heart disease (Chapter 19), and schizophrenia (Chapter 20). Collectively, these chapters cover an impressive array of common complex human diseases and provide an epidemiologic approach to rapidly emerging data on gene-disease and gene-environment interactions.

In Part IV, we discuss methodological issues surrounding specific applications of human genomic information for medicine and public health. We start in Chapter 21 with a review of the concept of Mendelian Randomization, an approach that allows us to assess the role of environmental factors and other biomarkers in the occurrence of human diseases using data on the association of genetic variation and disease endpoints. In Chapter 22, we discuss how clinical epidemiologic concepts and methods can be used to assess whether one or more genetic variants (e.g., genome profiles) can be used to predict risk for human diseases. Chapter 23 presents a major milestone for public health genomics, namely the publication of methods of systematic review and assessment of the clinical validity and utility of genomic applications in clinical practice. This chapter is a reprint of the published paper from the independent multidisciplinary panel, the EGAPP™ working group, sponsored by CDC and many partners. Chapter 24 briefly summarizes how reviews of the evidence on validity and utility of genomic information can be done systematically and rapidly, even in the face of incomplete information. Chapter 25 focuses on the crucial role of the behavioral and social sciences in assessing the impact and value of epidemiologic information on gene-disease associations. Chapter 26 addresses issues in evaluating developments in newborn screening. Chapter 27 provides an epidemiologic framework for the evaluation of pharmacogenomic applications in clinical and public health practice. Chapter 28 presents an overview of the relevance and impact of epigenomics in clinical practice and disease prevention. Finally, Chapter 29 presents an epidemiologic framework for evaluating family health history as a tool for disease prevention and health promotion. Even in this genomics era, family history remains a strong foundation, not only for identifying single gene disorders, but also for stratifying individuals and populations by different levels of disease risk and implementing personalized interventions.

Finally, in Part V of the book, we present a few case studies of the application of epidemiologic methods of assessment of clinical validity and utility for several disease examples. These include two pharmacogenomic testing examples—initial treatment of depression with SSRIs (Chapter 30) and warfarin therapy (Chapter 31). We also present information on population screening for hereditary hemochromatosis (Chapter 32), a genetic disorder with incomplete penetrance that has attracted some attention over the past decade as a possible example of population screening in the genomics era.

The second edition of Human Genome Epidemiology is primarily targeted at basic, clinical, and population scientists involved in studying genetic factors in common diseases. In addition, the book focuses on practical applications of human genome variation in clinical practice and disease prevention. We hope that students, clinicians, public health professionals, and policy makers will find the book useful in learning about evolving methods for approaching the discovery and the use of genetic information in medicine and public health in the twenty-first century.

 Top of Page


 Top of Page


  1. Khoury MJ, Little J, Burke W, eds. Human Genome Epidemiology: A Scientific Foundation for Using Genetic Information to Improve Health and Prevent Disease. New York: Oxford University Press; 2004.
  2. Nature Omics Gateway.
  3. International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320.
  4. Thomas DC. Are we ready for whole genome association studies? CEBP. 2006; 4:595–598.
  5. Knoppers BM. Biobanking: international norms. J Law Med Ethics. 2005;33:7–14.
  6. Manolio TA, Brooks LD, Collins FS. A Hapmap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–1605.
  7. National Human Genome Research Institute-Office of Population Genomics. A catalog of published genomewide association studies.
  8. National Center for Biotechnology Information. Database on genotypes and phenotypes (dbGAP).
  9. National Cancer Institute. Cancer genetic markers of susceptibility (CGEMS).
  10. Feero WG, Guttmacher AE, Collis FS. The genome gets personal-almost. JAMA. 2008;299:1351–1352.
  11. Department of Health and Human Services: personalized healthcare initiative.
  12. Secretary’s Advisory Committee on Genetics, Health and Society. US system of oversight of genetic testing.
  13. Khoury MJ, Bradley L, Berg A, et al. The evidence dilemma in genomic medicine: the need for a roadmap for translating genomic discoveries into clinical practice. Health Affairs. 2008;27(6): 1600–1611.
  14. Hudson KL, Holohan MK, Collins FS. Keeping pace with the times—the Genetic Information Nondiscrimination Act of 2008. N Engl J Med. 2008;358:2661–2663.
  15. Hogarth S, Javitt G, Melzer D. The Current Landscape for Direct-to-Consumer Genetic Testing: Legal, Ethical, and Policy Issues. Ann Rev Genom Hum Genet. 2008;9:161–182.
  16. McGuire AL, Cho MK, McGuire SE, et al. The future of personal genomics. Science. 2007;317:1687.
  17. Hunter DJ, Khoury MJ, Drazen JM. Letting the genome out of the bottle-will we get our wish. N Engl J Med. 2008;358:105–107.
  18. Agency for Healthcare Research and Quality. Infrastructure to monitor utilization and outcomes of gene-based applications: an assessment.
  19. Burke W, Khoury MJ, Stewart A, et al. The path from genome-based research to population health: development of an international public health genomics network. Genet Med. 2006;8:451–458.
  20. Khoury MJ, Bowen S, Bradley LK, et al. A decade of public health genomics in the United States, Centers for Disease Control and Prevention. Public Health Genomics. 2009;12:20–29.
  21. Knoppers BM, Brand AM. From community genetics to public health genomics: what’s in a name. Public Health Genomics. 2009;12:1–3.
  22. Buchanan AV, Weiss KM, Fullerton SM. Dissecting complex disease: the quest for the philosopher’s stone? Int J Epidemiol. 2006;35:562–571.
  23. Rockhill B. Theorizing about causes at the individual level while estimating effects at the population level: implications for prevention. Epidemiology. 2005;16:124–129.
  24. Holtzman NA. What role for public health in genetics and vice versa? Community Genet. 2006;9:8–20.
  25. Rose G. Sick individuals and sick populations. Int J Epidemiol. 1985;14:32–38.
  26. Khoury MJ, Gwinn M, Burke W, et al. Will genomics widen or help heal the schism between medicine and public health? Am J Prev Med. 2007;33:310–317.
  27. Institute of Medicine. Who Will Keep the Public Healthy? Educating Public Health Professionals for the 21st Century. Washington, DC: National Academies Press; 2003.
  28. Khoury MJ, Gwinn M, Yoon PW, et al. The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into healthcare and disease prevention. Genet Med. 2007;9:665–674.
  29. Centers for Disease Control and Prevention. The Human Genome Epidemiology Network (HuGENet).
  30. Yu W, Gwinn M, Clyne M, et al. A navigator for human genome epidemiology. Nat Genet. 2008;40:124–125. Also available online at
  31. HuGENet handbook of HuGE reviews, edition 1.0 [PDF 166.11 KB]
  32. Ioannidis JPA, Bernstein J, Boffetta P, et al. A network of investigator networks in human genome epidemiology. Am J Epidemiol. 2005;162:302–304.
  33. STrengthening the REporting of Genetic Associations (STREGA). Ann Intern Med. February 3, 2009;150(3):206–215.
  34. Ioannidis JPA, Gwinn M, Little J, et al. The Human Genome Epidemiology Network. A road map for efficient and reliable human genome epidemiology. Nat Genet. 2006;38:3–5.
  35. Editorial. Embracing risk. Nat Genet. 2006;38:1.
  36. Ioannidis JPA, Boffetta P, Little J, et al. Cumulative assessment of genetic associations: interim guidelines. Int J Epidemiol. 2008;37:120–132.
  37. HuGENet workshop 2008: Networks, genomewide association studies and the knowledge base on genetic variation and human health. 

 Top of Page