Chapter 23 - The Evaluation of Genomic Applications in Practice and Prevention (EGAPP™) initiative: methods of the EGAPP™ Working Group Tables

Human Genome Epidemiology (2nd ed.): Building the evidence for using genetic information to improve health and prevent disease

“The findings and conclusions in this book are those of the author(s) and do not
necessarily represent the views of the funding agency.”
These chapters were published with modifications by Oxford University PressExternal (2010)

Steven M. Teutsch, Linda A. Bradley, Glenn E. Palomaki, James E. Haddow, Margaret Piper, Ned Calonge, W. David Dotson, Michael P. Douglas, and Alfred O. Berg

Application of test Clinical validity Clinical utility
Table 23-1
Categories of genetic test applications and some characteristics of how clinical validity and utility are assessed
Diagnosis (symptomatic patient) Association of marker with disorder Improved clinical outcomes*—health outcomes based on diagnosis and subsequent intervention or treatment

Availability of information useful for personal or clinical decision making

End of diagnostic odyssey

Disease screening (asymptomatic patient) Association of marker with disorder Improved health outcome based on early intervention for screen positive individuals to identify a disorder for which there is intervention or treatment, or provision of information useful for personal or clinical decision making
Risk assessment/ susceptibility Association of marker with future disorder (consider possible effect of penetrance) Improved health outcomes based on prevention or early detection strategies
Prognosis of diagnosed disease Association of marker with natural history benchmarks of the disorder Improved health outcomes, or outcomes of value to patients, based on changes in patient management
Predicting treatment response or adverse events (pharmacogenomics) Association of marker with a phenotype/ metabolic state that relates to drug efficacy or adverse drug reactions Improved health outcomes or adherence based on drug selection or dosage

*Clinical outcomes are the net health benefit (benefits and harms) for the patients and/or population in which the test is used.

[back to chapter]


Criteria related to health burden What is the potential public health impact based on the prevalence/incidence of the disorder, the prevalence of gene variants, or the number of individuals likely to be tested?

What is the severity of the disease?

How strong is the reported relationship between a test result and a disease/drug response?

Is there an effective intervention for those with a positive test or their family members?

Who will use the information in clinical practice (e.g., health care providers, payers) and how relevant might this review be to their decision making?

Criteria related to practice issues What is the availability of the test in clinical practice?

Is an inappropriate test use possible or likely?

What is the potential impact of an evidence review or recommendations on clinical practice? On consumers?

Other considerations How does the test add to the portfolio of EGAPP™ evidence-based reviews? As a pilot project, EGAPP™ aims to develop a portfolio of evidence reviews that adequately tests the process and methodologies.

Will it be possible to make a recommendation, given the body of data available? EGAPP™ is attempting to balance selection of somewhat established tests versus emerging tests for which insufficient evidence or unpublished data are more likely.

Are there other practical considerations? For example, avoiding duplication of evidence reviews already underway by other groups.

How does this test contribute to diversity in reviews? In what category is this test?
As a pilot project, EGAPP™ aims to consider different categories of tests (e.g., pharmacogenomics or cancer), mutation types (e.g., inherited or somatic) or test types (e.g., predictive or diagnostic).

Table 23-2
Criteria for preliminary ranking of topics

[back to chapter]


Level* Analytic validity Clinical validity Clinical utility
Table 23-3
Hierarchies of data sources and study designs for the components of evaluation
1 Collaborative study using a large panel of well- characterized samples

Summary data from well-designed external proficiency testing schemes or interlaboratory comparison programs

Well-designed longitudinal cohort studies

Validated clinical decision rule

Meta-analysis of randomized controlled trials (RCT)
2 Other data from proficiency testing schemes

Well-designed peer- reviewed studies (e.g., method comparisons, validation studies)

Expert panel reviewed FDA summaries

Well-designed case- control studies A single randomized controlled trial
3 Less well-designed peer-reviewed studies Lower quality case- control and cross- sectional studies

Unvalidated clinical decision rule

Controlled trial without randomization

Cohort or case-control study

4 Unpublished and/or nonpeer-reviewed research, clinical laboratory, or manufacturer data

Studies on performance of the same basic methodology, but used to test for a different target

Case series

Unpublished and/or nonpeer-reviewed research, clinical laboratory or manufacturer data

Consensus guidelines

Expert opinion

Case series

Unpublished and/or nonpeer-reviewed studies

Clinical laboratory or manufacturer data

Consensus guidelines

Expert opinion

*Highest level is 1.
A clinical decision rule is an algorithm leading to result categorization. It can also be defined as a clinical tool that quantifies the contributions made by different variables (e.g., test result, family history) in order to determine classification/interpretation of a test result (e.g., for diagnosis, prognosis, therapeutic response) in situations requiring complex decision making ( 55).

[back to chapter]


Analytic validity Clinical validity Clinical utility
Table 23-4
Criteria for assessing quality of individual studies (internal validity) (55)
Adequate descriptions of the index test (test under evaluation)Source and inclusion of positive and negative control materials
Reproducibility of test results

Quality control/assurance

Adequate descriptions of the test under evaluation

Specific methods/platforms evaluated

Number of positive samples and negative controls tested

Adequate descriptions of the basis for the “right answer”

Comparison to a “gold standard” referent test

Consensus (e.g., external proficiency testing)

Characterized control materials (e.g., NIST, sequenced)

Avoidance of biases

Blinded testing and interpretation

Specimens represent routinely analyzed clinical specimens in all aspects (e.g., collection, transport, processing)

Reporting of test failures and uninterpretable or indeterminate results

Analysis of data

Point estimates of analytic sensitivity and specificity with 95% confidence intervals

Sample size/power calculations addressed

Clear description of the disorder/phenotype and outcomes of interestStatus verified for all cases Appropriate verification of controls

Verification does not rely on index test result

Prevalence estimates are provided

Adequate description of study design and test/ methodology

Adequate description of the study population

Inclusion/exclusion criteria

Sample size, demographics

Study population defined and representative of the clinical population to be tested

Allele/genotype frequencies or analyte distributions known in general and subpopulations

Independent blind comparison with appropriate, credible reference standard(s)

Independent of the test

Used regardless of test results

Description of handling of indeterminate results and outliers

Blinded testing and interpretation of results

Analysis of data

Possible biases are identified and potential impact discussed

Point estimates of clinical sensitivity and specificity with 95% confidence intervals

Estimates of positive and negative predictive values

Clear description of the outcomes of interestWhat was the relative importance of outcomes measured; which were prespecified primary outcomes and which were secondary?

Clear presentation of the study design

Was there clear definition of the specific outcomes or decision options to be studied (clinical and other endpoints)?

Was interpretation of outcomes/endpoints blinded?

Were negative results verified?

Was data collection prospective or retrospective?

If an experimental study design was used, were subjects randomized? Were intervention and evaluation of outcomes blinded?

Did the study include comparison with current practice/empirical treatment (value added)?


What interventions were used?

What were the criteria for the use of the interventions?

Analysis of data

Is the information provided sufficient to rate the quality of the studies?

Are the data relevant to each outcome identified?

Is the analysis or modeling explicit and understandable?

Are analytic methods prespecified, adequately described, and appropriate for the study design?

Were losses to follow-up and resulting potential for bias accounted for?

Is there assessment of other sources of bias and confounding?

Are there point estimates of impact with 95% CI?

Is the analysis adequate for the proposed use?

NIST = National Institute of Standards and Quality.

[back to chapter]


Adequacy of information to answer key questions Analytic validity Clinical validity Clinical utility
Table 23-5
Grading the quality of evidence for the individual components of the chain of evidence (key questions) (57)
Convincing Studies that provide confident estimates of analytic sensitivity and specificity using intended sample types from representative populationsTwo or more Level 1 or 2 studies that are generalizable, have a sufficient number and distribution of challenges, and report consistent results

One Level 1 or 2 study that is generalizable and has an appropriate number and distribution of challenges

Well-designed and conducted studies in representative population(s) that measure the strength of association between a genotype or biomarker and a specific and well-defined disease or phenotypeSystematic review/metaanalysis of Level 1 studies with homogeneity

Validated Clinical Decision Rule

High quality Level 1 cohort study

Well-designed and conducted studies in representative population(s) that assess specified health outcomesSystematic review/ meta-analysis of randomized controlled trials showing consistency in results

At least one large randomized controlled trial (Level 2)

Adequate Two or more Level 1 or 2 studies that

Lack the appropriate number and/or distribution of challenges

Are consistent, but not generalizable

Modeling showing that lower quality (Level 3, 4) studies may be acceptable for a specific well-defined clinical scenario

Systematic review of lower quality studies

Review of Level 1 or 2 studies with heterogeneity

Case-control study with good reference standards

Unvalidated Clinical Decision Rule (Level 2)

Systematic review with heterogeneity

One or more controlled trials without randomization (Level 3)

Systematic review of Level 3 cohort studies with consistent results

Inadequate Combinations of higher quality studies that show important unexplained inconsistencies

One or more lower quality studies (Level 3 or 4)

Expert opinion

Single case-control study

Nonconsecutive cases

Lacks consistently applied reference standards

Single Level 2 or 3 cohort/ case-control study

Reference standard defined by the test or not used systematically

Study not blinded Level 4 data

Systematic review of Level 3 quality studies or studies with heterogeneity

Single Level 3 cohort or case-control study Level 4 data

[back to chapter]


Level of certainty Recommendation
Table 23-6
Recommendations based on certainty of evidence, magnitude of net benefit, and contextual issues
High or moderate Recommend for…. . . if the magnitude of net benefit is Substantial, Moderate, or Small*, unless additional considerations warrant caution.

Consider the importance of each relevant contextual factor and its magnitude or finding.

Recommend against…. . . if the magnitude of net benefit is zero or there are net harms.

Consider the importance of each relevant contextual factor and its magnitude or finding.

Low Insufficient evidence…. . . if the evidence for clinical utility or clinical validity is insufficient in quantity or quality to support conclusions or make a recommendation.

Consider the importance of each contextual factor and its magnitude or finding.

Determine whether the recommendation should be Insufficient (neutral), Insufficient (encouraging), or Insufficient (discouraging).

Provide information on key information gaps to drive a research agenda.

*Categories for the “magnitude of effect” or “magnitude of net benefit” used are Substantial, Moderate, Small, and Zero (57).

back to Chapter 23

Page last reviewed: January 6, 2010 (archived document)