necessarily represent the views of the funding agency.”
Human Genome Epidemiology: A Scientific Foundation for Using Genetic Information to Improve Health and Prevent Disease
Methods and Approaches 1: Assessing Disease Associations and Interactions
Facing the Challenge of Complex Genotypes and Gene-environment Interaction: the basic epidemiologic units in case-control and case-only designs
Lorenzo D. Botto and Muin J. Khoury
In this chapter, we focus on fundamental units of epidemiologic analysis of studies that relate health outcomes with complex genotypes and gene-environment interaction. The goal is to offer a practical perspective that researchers might find useful as they design, analyze, and present their studies, with emphasis on case-control and case-only designs. In the first part of the chapter, we focus on case-control studies and their core information (1). In particular, we illustrate ways in which such core information can be clearly presented to provide the fundamental measures of effect and impact, including the relative risks for the multiple factors under study (alone and jointly); the interaction effects; the exposure frequencies; and the attributable fractions.
In the second part of the chapter we discuss the potential role of well-designed disease registries as adjuncts or antecedents of case-control studies, and suggest that they might be particularly useful in the study of complex genotypes and interaction. In particular, we discuss the notion that a disease registry, approached through a case-only perspective, might be scanned for complex genotypes ranked by potential attributable fraction for the disease. Finally, we discuss the advantages and challenges of these approaches and their possible integration in studying the causation of common multi-factorial conditions.
We view the perspective presented in this chapter as complementary to the discussion in other sections of the book, in which methodologic aspects of the detection of joint effects and interaction are systematically presented. The approaches discussed here, particularly those related to the case-only analysis of disease registries, could enhance but not replace other strategies for the study of complex genotypes and gene-environment interaction.
Investigating Interaction in Epidemiology
Investigating genetic and gene-environment interaction in epidemiology raises definitional, methodologic, and practical questions. The meaning, measurement, and modeling of the effect of multiple factors, the biologic significance of epidemiologic assessment of interaction, and the appropriateness of specific study designs are but a few topics that continue to engender considerable debate (2-4). We will note briefly only two such issues for their relevance in this discussion of genetic factors and interaction.
First, bias and confounding in case-control studies (the type of study discussed here in some detail), though always a concern, can likely be decreased more easily when assessing genetic factors compared, for example, to environmental factors such as diet or lifestyle (4). For example, genotype can in principle be measured more precisely and objectively, compared for example to smoking or folic acid intake that are commonly assessed based on a subject's recall that may be imprecise or biased by disease status. Thus, exposure misclassification, both differential and non-differential, should decrease with a corresponding improvement in the precision and validity of risk estimates. Also, the stability of genotype over time is particularly valuable in case-control studies in which the factors under study are measured months or years after disease onset. Finally, genotypes for a given set of alleles are likely to distribute randomly in the population (Mendelian randomization) reducing the likelihood of spurious gene-environment or gene-gene associations (at unlinked loci) (4). Genetic substructure in the population remains a concern, but researchers have suggested strategies that take such substructure into account, using for example a panel of unrelated markers (5,6). These considerations, combined the known statistical efficiency of case-control studies, have revived the interest in case-control studies as powerful tools for the study of the effect of genotype on disease risk (4) and in part prompted our emphasis on such studies.
The second aspect of interaction that has discussed extensively relates to which measures of effect are most informative or useful. For example, in the case of two dichotomous factors one could estimate the effect of each factor alone as well as the joint effect. One could also estimate the departure of the joint effect from specific models of interaction, (eg, additive or multiplicative). It can be useful to note that the relation between individual and joint effects can take different forms (7) which can depend on the biologic mechanism underlying the interaction. However, it has been noted that predicting the biologic mechanism from such epidemiologic data is difficult and perhaps not productive (2).
With more than two factors under study, summary measures of interaction and statistical models become more complicated, and the ability to present the data and the primary measures of effect acquires renewed value. The explosive growth of genetic technology and the ever expanding catalogue of human genes (8,9) is already leading to studies of increasing complexity. For example, the risk for venous thrombosis is already being studied in relation to variants of the Factor V, prothrombin, and 5,10 methylenetetrahydrofolate reductase (MTHFR) genes, as well as to blood homocysteine levels and oral contraceptive use (10-13). Similarly, the risk for spina bifida is being studied in relation to variants of folate-related genes (e.g., MTHFR, cystathione-beta-synthase, methionine synthase, and methionine synthase reductase) and blood levels of selected vitamins (folate, B12) (14-17). Even (and perhaps particularly) in such complex settings, an appreciation of the basic analytic unit of epidemiologic analysis should help researchers develop a consistent starting point for data presentation and assessment.
Population-Based Case-Control Studies and the 2 X4 table
The simplest case of interaction is perhaps that of two dichotomous factors (e.g., presence or absence of a genotype, use or non-use of a pill). For illustration, we present data from case-control settings in which we assume the ideal conditions of an unbiased, unconfounded, population-based, incident-case study. We will further assume that the study's odds ratios are valid estimations of relative risks.
Data from such case-control study can be presented in a two-by-four table (Table 7-1). The same reference group is used to compute three odds ratios (each factor alone and jointly). Such odds ratios are the basic, direct measures of association.
Such presentation has several advantages (Table 7-2). The role of each factor is independently assessed both in terms of association and of potential attributable fraction. In addition, the odds ratios can be examined to assess their general relation (7) and formally evaluated in terms of departure from specified models of interaction (most commonly multiplicative or additive). The table also provides the distribution of the exposures among controls, and helps evaluate the dependence of factors in the underlying population (provided the controls are representative of such population). Finally, a case-only odds ratio can be easily derived and used as a comparison with findings from case-only studies in the literature.
The two-by-four table approach to presenting genetic and gene-environment interactions is appealing for several reasons.
- It is efficient: it summarizes, without loss of detail, seven two-by-two tables, and generates a comprehensive set of effect estimates that none of the latter, individually, can match.
- It highlights potential sample size issues: cell sizes are directly presented and confidence intervals show their effect on statistical power.
- It emphasizes effect estimation over model testing: the relative risk estimates associated with the joint and individual exposures are the primary elements of an interaction, whereas departures from specific models of interactions are derived parameters and explicitly labeled as such.
In summary, the table provides the simplest epidemiologic equivalent of the general statement that all effects on human health are attributable to the joint effect of genes and the environment. Indeed, it can be argued that the two-by-four table (and not the two-by-two table) is the fundamental unit of epidemiologic analysis.
A simple application of the two-by-four table
We illustrate the two-by-four table approach using data from a case-control study of venous thromboembolism in relation to factor V Leiden and oral contraceptive use (18). When the original data are so rearranged (Table 7-3), one can clearly appreciate certain key aspects of the interaction:
- The marginal and joint effects. For example, the odds ratio associated with Factor V Leiden and oral contraceptive use alone (6.9 and 3.7, respectively) can be contrasted with that associated with the combined exposure (34.7)
- The potential attributable fractions. Provided the associations are causal, one can note the potential public health relevance of the findings (the computation of attributable fractions for two or more factors was developed by several authors and has been summarized (19)). The relatively high frequency in the population of the gene variant (2.4 percent among controls) and of the joint exposure (1.2 percent) translates into considerable population attributable fractions for thromboembolic disease (5.5 and 15.7 percent, respectively).
One can contrast to such presentation with a stratified analysis in which the association between the oral contraceptive use and venous thrombosis is assessed separately among those with and without the Factor V Leiden polymorphism (Table 7-4). The latter approach does not provide immediately information on individual and joint effects, and tends to emphasize departure from a specific (multiplicative) model of interaction. The two-by-four table does not have such limitation provides the data to test for other non-multiplicative models as well.
A further assessment of the data from the two-by-four table involves the relation of the factors separately among cases and controls (Table 7-4). Conceptually, one can split vertically the case-control study into a case-only study and a control-only study and examine the respective odds ratios. The case-only design in itself is an efficient and valid approach to screening for interaction, provided that the fundamental assumption of independence of exposure and genotype in the population is justified (20, 21). The potential role of such studies in the epidemiologic approach to complex diseases has been reviewed (22,23) and will be examined later in connection with the discussion of disease registries. Also the association of risk factors among controls (control-only odds ratio) can provide useful information, namely the dependencies of the risk factors (genetic or environmental) in the underlying population. Detecting such dependencies is important both as a clue for a biologic relation between alleles at the loci under study and as a test of the key assumption in the interpretation of case-only data.
Three Factors: The 2 X 8 table
The points underscored by the two-by-four table are even clearer for three factors—three genes, three environmental factors, or a combination of genetic and environmental factors. With three dichotomous factors, the exposure combinations become 8 (23). Although more complex, such a table still shows the primary epidemiologic parameters (odds ratios and attributable fractions) associated with each factor and combination of factors. Because all refer to the same reference group, the relations between these measures are immediately evident; if needed, one can also assess which model of interaction best fits the data. Methodologic issues, such as sample size and exposure dependencies among the controls, can also be assessed with relative ease. The contrast with classic stratified analysis is even greater than in the case of two factors. To present such stratified analysis, a minimum of four tables is needed; because they have different reference groups, the four odds ratios would not be directly comparable; and the overall interpretation of the study is less immediately clear.
The two-by-four or the two-by-eight table, though simple, may adequately summarize some, but not all epidemiologic relations. Issues that come into play in more complex situations include the following.
- The number of factors can increase. Even for dichotomous factors, the number of exposure combinations grows quickly (2n for n factors) and the corresponding table rapidly becomes unwieldy.
- The relation between exposure and outcome can be other than dichotomous. For example, the relation can be graded or continuous (dose-response) as occurs with smoking and lung cancer or with obesity and hypertension. In the general case of n exposures each with its dose-response curve, the response surface is best described as a general n-dimensional manifold which may not be meaningfully summarized by few discrete odds ratios.
- As more factors are involved, their interaction may not adequately described by simple multiplicative or additive models.
These limitations highlight two issues that will increasingly confront epidemiologists as they try to unravel the web of interaction in disease causation. First, new or improved epidemiologic methods may be needed to deal with such complex situations. For example, researchers have suggested using a variety of regression models, including hierarchical models, and neural networks, traditionally used in modeling the probability of clinical outcomes (24,25), to the study multiple factors and interaction (26-29). So far, these approaches have limitations: the output of regression models, for example, is model-dependent; neural networks, though in general less dependent on prior model specification (26-28), may be limited in their ability explicitly to estimate dependencies among risk factors (26,27).
The second issue relates to sample size. As the number of factors under study increases so do the strata that have to be defined within the study. With a fixed total number of subjects, increasing the number of factors quickly reduces per-stratum size and the associated statistical power. Thus, negative findings should be carefully interpreted. Strategies to deal with this issue include conducting well-designed collaborative studies that increase sample size but also deal effectively with extraneous genetic heterogeneity.
In conclusion, researchers are challenged to apply epidemiologic methods to increasingly complex data on multiple factors and interaction. Carefully conducted collaborative studies may provide adequate sample size. A clear presentation and analysis of the core elements of these interactions (the data distribution and the primary measures of association) may increase the information that can be extracted from the data. In this sense, the two-by-four table and its immediate extensions are fundamental, simple, and useful tools to documenting and studying gene-environment interaction.
Disease Registries and Case-Only Designs
Population-based case-control studies are fundamental tools in etiologic studies, particularly for their ability to provide key parameters of the human genome epidemiology of many conditions (4, 30). The challenges of case-control studies, particularly the recruitment of an adequate set of control subjects, and the refinement of case-only approaches suggests novel approaches in studying the role of complex genotypes in disease etiology. The availability of well-designed disease registries provides a practical setting for case-only studies of common conditions such as certain cancers and birth defects. Such case-only approaches cannot replace but rather enhance traditional case-control (or cohort) studies, particularly in three key areas:
- Scanning for genotypes that potentially contribute the most to disease in a population.
- Evaluating etiologic heterogeneity and genotype-phenotype correlations among subsets of cases.
- Detecting supra-multiplicative effects of interacting alleles.
Scanning Genotypes By Potential Contribution To Disease In The Population
Studying the role of complex genotypes, i.e., the interaction of multiple alleles at multiple loci, presents numerous challenges, including the large number of possible allele combinations. In theory, m alleles at n loci can generate mncombinations (haplotypes): with 10 loci, two alleles can generate in excess of 1,000 combinations, and three alleles nearly 60,000 combinations.
Given their potentially large number, which allele combinations should one look at first? One approach is to focus first on allele combinations that potentially contribute to the largest proportion of disease in a population or, in epidemiologic terms, on those with the highest potential population-attributable fraction. It is easy to show that even though one cannot determine relative risks in case-only studies, one can estimate the upper limit of a genotype’s attributable fraction. Assuming causality, such potential maximum attributable fraction is simply the frequency of the genotype among cases. This relation is intuitively obvious, since if x percent of a random series of cases has a particular exposure, then at most x percent can be caused by that exposure. The formal relation Fc = AF*(OR /OR-1) derives directly from Miettinen’s formula for attributable fraction (31).
Thus, attributable fraction (AF) is, at most, as high as the fraction of cases with the exposure—in this case the genotype—of interest (Fc) but never higher, regardless of how high the odds ratio or relative risk. The equation also illustrates the non-linear relation between odds ratio and attributable fraction, implying that variations in the upper range of odds ratios translate into progressively smaller changes in attributable fraction (for variations in the odds ratio between 10 and 1000, the fraction of exposed cases differs from the attributable fraction by less than one part in 10).
One might argue that when the genotype frequency in the population is unknown, little should be inferred from genotype frequencies among cases. However, in the case of complex genotypes such relation becomes interesting because, under the hypothesis of no effect, few subjects are expected to have any given (complex) genotype, defined as a certain combination of alleles at a number of loci. More precisely, that number decreases multiplicatively with the number of loci considered concurrently (Figure 1). For example, with five loci and one common variant allele per locus with a frequency in the population of 10 percent, one would expect that by chance alone, the genotype with the five variant alleles would be found in 0.105 or one in 100,000 people. The practical usefulness of such consideration is that researchers can expect that complex genotypes observed with some frequency among cases, even in a small percent of cases, might be likely candidates for further study. Thus complex genotypes are one specific scenario where examining case-only frequencies might help focus the search for allele combinations with a potentially significant role in disease causation.
Examining Homogeneous Subset and Determinants Of Severity Or Phenotype
The case-only approach to the analysis of complex genotypes could also help define smaller, more homogenous subsets distinguished by phenotype, disease progression, or severity (20). These more homogeneous subsets can be compared with respect to genotypes to study the possible relation between genotype and outcome. For example, one might separate cases of first occurrence of venous thrombosis from those of recurrence, or cases of myocardial infarction by age of onset. Such analyses can provide clues to the genetic heterogeneity underlying common disorders and help relate genotypic variation to clinically relevant differences in outcome.
Searching For Supra-Multiplicative Interactions
The analysis of disease registries in a case-only fashion can provide some indication of interaction among alleles using, for example, log-linear models. Log-linear models have been used to test for higher order associations in a multiplicity of settings, including associations between structural anomalies in the same baby (32), between maternal and fetal genotypes and disease (33-35), and between genotype markers and disease (36,37). In conjunction with prior information of linkage between alleles, the results of log-linear modeling, for example, can provide some indication of whether the joint effect of certain allele combinations differs from that expected under a multiplicative null hypothesis, i.e., whether the joint effect equals the product of each allele’s effect alone. In this respect, such approach is a natural extension of the well-known case-only odds ratio (21), which measures the deviation from simple multiplicative effects of two factors, and is subject to similar interpretations and limitations (22,38). Among the limitations of log-linear modeling are its sensitivity to sparse data, which is a real concern in the analysis of complex genotypes, and its assumption of a log-linear relation between factors. Moreover, marginal effects of each allele cannot be measured. Nevertheless, the context of complex genotypes with relatively common susceptibility alleles is precisely where one might expect to find significant, supra-multiplicative interactions if such genotypes contribute to disease.
Limitation of the Scanning Disease Registries Using A Case-Only Approach
The main thrust of this discussion of case-only designs (Table 7-5) is that, in the context of the study of complex genotypes in disease etiology, well-designed disease registries can be informative and relatively inexpensive resources that could complement and enhance the value of traditional case-control or cohort studies. Provided the key assumptions of case-only studies hold, such assessment of disease registries could provide researchers with clues on the health effects of certain complex genotype, including their potential contribution to disease in the population, their involvement in significant supra-multiplicative interaction, and their relation to case subgroups with distinctive etiology and severity of outcome. It should be noted that such approach does not pursue gene discovery in the manner of a genome scan. Rather, it uses known allelic variation at candidate loci as a starting point to examine the potential contribution to disease etiology.
However, the assessment of disease registries using case-only methods is not an alternative to traditional studies that use population controls. Its limitations, which stem from the limits of the case-only design, should be recognized clearly:
Case-only studies offer no information about marginal risks for specific genotypes.
They assess only deviations from purely multiplicative interactions, which is only one of the possible scenarios in which different alleles at different loci interact to modulate disease risk (7). Important genetic effects, such as strong effects from single gene variants, might not generate a signal. Other complex scenarios that defy facile conclusions include interactions of gene variants that increase disease risk with others that reduce risk.
The validity of interaction assessment in case-only studies is exquisitely sensitive to independence assumptions for the factors in the population (39). One might imagine combinations that could be expected to violate that independence, whether among genotype and environmental factors (eg, cigarette smoking and genes involved in detoxification, due to selective attrition in the population), or else among different genes. Also, independence among gene combinations might be violated if population stratification induces correlation between genes (even if the genotypes occur independently within each subpopulation), though this problem might be solved by appropriate stratification. Alternatively, dependencies between loci might occur if the two loci are on the same chromosome, even in populations with random mating, if mutations are relatively recent. So far, these appear mostly to be theoretical concerns, for lack of empirical evidence that such dependencies. Recent data for example suggests that this is not a problem for the more commonly studied metabolic genes (40).
Case-only studies also do not provide full information on the attributable fraction for gene combinations, and they only estimate their upper limit.
Case-only studies require that cases represent a random or unselected series of cases, as could be assembled by a population-based registry. Series assembled from tertiary centers might be subject to selection forces that might preclude valid inferences.v
At the same time, one should note the potential advantages in speed, efficiency, and precision of the two-tiered approach that begins with case-only studies and uses their findings to design further studies. Such advantages include the following:
Researchers could complete their studies more rapidly, by examining existing or easily developed case groups, as might be derived from population-based disease registries. Several such registries, for example, already exist for many conditions including cancers and birth defects.
The resources otherwise used to enroll and study convenient controls could be used instead to expand the spectrum of candidate genes and alleles among increasing numbers of cases.
The effect estimates could gain in precision (no variance associated with controls) and validity (no population stratification).
- Subsequent studies might be more efficient. For example, case-control studies might use evidence from case-only studies to decide on reasonable sample sizes (for cases and controls) that might vary by ethnicity or disease subgroup.
In a broader perspective, one should realistically approach such screening of case-only studies within the well-known expectations of a screening process. Along with valid results both false positives and false negatives will occur, the former for example if linkage disequilibrium unrelated to disease were present.
It is tempting to speculate to what extent the conceptual framework presented here can be transferred from genomics to proteomics. Whereas the ability to detect multiple genetic variants with a single functional test would appear to increase researchers’ ability to examine an ever-widening web of metabolic networks, the independence assumptions might be commonly violated with proteomics, because of feedback regulation systems governing the transcription of genes into proteins.
Recruiting sufficient numbers of study participants remains a basic issue. Although analytic techniques such as multi-factor dimensionality reduction (41) are being suggested as possible enhancements in the study of complex genotypes, sample size requirements remain an inescapable challenge for researchers.
Finally, case-only approaches in no way diminish but in fact underscore the need to tackle and solve the complex legal, ethical, social, and practical issues of selecting, recruiting, and testing representative samples of the population for genetic studies.
As researchers realize the synergy between traditional and non traditional studies, we should encourage a concerted effort at developing, on the one hand, well-designed disease registries, and on the other, representative samples from well-defined populations that are large and accessible.
In the study of interaction, it is useful to evaluate and present information on both the marginal and joint exposures (gene-environment combinations). Departure from specified models of interaction can be informative but should not be the sole focus of the analysis. Key information for each term of the interaction includes the frequency of the gene-environment exposure (or the complex genotype) in the reference population, and the disease-associated relative risks and attributable fractions.
The appreciation of certain epidemiologic units of analysis can facilitate the systematic assessment and clear presentation of data on multiple factors and interaction. In population-based case-control studies, particularly in their simplest forms (with two dichotomous factors), one such unit of analysis is the two-by-four table. Though more complex situations can require more complex approaches, the two-by-four table in many situations can provide a useful starting point for data assessment, presentation, and analysis.
Population-based disease registries can be important research resources. Using analytic approaches derived from case-only methods, researchers could scan such registries for complex genotypes and other exposure combinations associated with the highest attributable fraction for disease. Such analysis could also provide clues on the presence of supra-multiplicative interaction, as well as of determinants of disease severity and phenotype among population subgroups.
Case-control and case-only studies are best viewed as complementary rather than alternative approaches to the assessment of interaction. Appreciating the basic units of epidemiologic analysis within each study design, and using both designs synergically, can contribute to the efficient and systematic assessment of the role in disease etiology of multiple factors, complex genotype, and interaction.
- Botto LD, Khoury MJ. Commentary: facing the challenge of gene-environment interaction: the two-by-four table and beyond. Am J Epidemiol 2001;153:1016-20.
- Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol 1991;44:221-32.
- Greenland S, Rothman KJ. Concepts of interaction. In: Greenland S, Rothman KJ, eds. Modern Epidemiology: Lippincott-Philadelphia, 1998:329-342.
- Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. [see comments.]. Lancet 2001;358:1356-60.
- Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67:170-81.
- Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68:466-77.
- Khoury MJ, Adams MJ, Jr., Flanders WD. An epidemiologic approach to ecogenetics. Am J Hum Genet 1988;42:89 95.
- Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA. Online Mendelian Inheritance in Man (OMIM). Hum Mutat 2000;15:57-61.
- Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L. New goals for the U.S. Human Genome Project: 1998-2003. Science 1998;282:682-9.
- Gerhardt A, Scharf RE, Beckmann MW, et al. Prothrombin and factor V mutations in women with a history of thrombosis during pregnancy and the puerperium [see comments]. N Engl J Med 2000;342:374-80.
- Akar N, Akar E, Akcay R, Avcu F, Yalcin A, Cin S. Effect of methylenetetrahydrofolate reductase 677 C-T, 1298 A-C, and 1317 T-C on factor V 1691 mutation in Turkish deep vein thrombosis patients. Thromb Res 2000;97:163-7.
- Martinelli I, Taioli E, Bucciarelli P, Akhavan S, Mannucci PM. Interaction between the G20210A mutation of the prothrombin gene and oral contraceptive use in deep vein thrombosis. Arteriosclerosis, Thrombosis & Vascular Biology 1999;19:700-3.
- Cattaneo M, Chantarangkul V, Taioli E, Santos JH, Tagliabue L. The G20210A mutation of the prothrombin gene in patients with previous first episodes of deep-vein thrombosis: prevalence and association with factor V G1691A, methylenetetrahydrofolate reductase C677T and plasma prothrombin levels. Thromb Res 1999;93:1-8.
- Botto LD, Yang Q. 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: a HuGE review. Am J Epidemiol 2000;151:862-77.
- Christensen B, Arbour L, Tran P, et al. Genetic polymorphisms in methylenetetrahydrofolate reductase and methionine synthase, folate levels in red blood cells, and risk of neural tube defects. Am J Med Genet 1999;84:151-7.
- Shaw GM, Rozen R, Finnell RH, Wasserman CR, Lammer EJ. Maternal vitamin use, genetic variation of infant methylenetetrahydrofolate reductase, and risk for spina bifida. Am J Epidemiol 1998;148:30-7.
- Wilson A, Platt R, Wu Q, et al. A common variant in methionine synthase reductase combined with low cobalamin (vitamin B12) increases risk for spina bifida. Mol Genet Metab 1999;67:317-23.
- Vandenbroucke JP, Koster T, Briet E, Reitsma PH, Bertina RM, Rosendaal FR. Increased risk of venous thrombosis in oral-contraceptive users who are carriers of factor V Leiden mutation. Lancet 1994;344:1453-7.
- Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health 1998;88:15-9.
- Begg CB, Zhang ZF. Statistical analysis of molecular epidemiology studies employing case-series. Cancer Epidemiol Biom Prev 1994;3:173-5.
- Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for Assessing susceptibility in population-based case-control studies. Stat Med 1994;13:153-62.
- Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls. Am J Epidemiol 1996;144:207-13.
- Yang Q, Khoury MJ. Evolving methods in genetic epidemiology. III. Gene-environment interaction in epidemiologic research. Epidemiol Rev 1997;19:33-43.
- Ioannidis JP, McQueen PG, Goedert JJ, Kaslow RA. Use of neural networks to model complex immunogenetic associations of disease: human leukocyte antigen impact on the progression of human immunodeficiency virus infection. Am J Epidemiol 1998;147:464-71.
- Marchevsky AM, Patel S, Wiley KJ, et al. Artificial neural networks and logistic regression as tools for prediction of survival in patients with Stages I and II non-small cell lung cancer. Mod Pathol 1998;11:618-25.
- Duh MS, Walker AM, Ayanian JZ. Epidemiologic interpretation of artificial neural networks. Am J Epidemiol 1998;147:1112-22.
- Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes [see comments]. J Clin Epidemiol 1996;49:1225-31.
- Warner B, Misra M. Understanding neural networks as statistical tools. The American Statistician 1996;50:284-293.
- Aragaki CC, Greenland S, Probst-Hensch N, Haile RW. Hierarchical modeling of gene-environment interactions: estimating NAT2 genotype-specific dietary effects on adenomatous polyps. Cancer Epidemiol Biom Prev 1997;6:307-14.
- Khoury MJ, Little J. Human genome epidemiologic reviews: the beginning of something HuGE. Am J Epidemiol 2000;151:2-3.
- Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 1974;99:325-32.
- Beaty TH, Yang P, Khoury MJ, Harris EL, Liang KY. Using log-linear models to test for associations among congenital malformations. Am J Med Genet 1991;39:299-306.
- Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads". Am J Epidemiol 1998;148:893-901.
- Shields DC, Kirke PN, Mills JL, et al. The "thermolabile" variant of methylenetetrahydrofolate reductase and neural tube defects: An evaluation of genetic risk and the relative importance of the genotypes of the embryo and the mother. Am J Hum Genet 1999;64:1045-55.
- Shields DC, Ramsbottom D, Donoghue C, et al. Association between historically high frequencies of neural tube defects and the human T homologue of mouse T (Brachyury). Am J Med Genet 2000;92:206-11.
- Huttley GA, Wilson SR. Testing for concordant equilibria between population samples. Genetics 2000;156:2127-35.
- Khamis HJ, Hinkelmann K. Log-linear-model analysis of the association between disease and genotype. Biometrics 1984;40:177-88.
- Yang Q, Khoury MJ, Sun F, Flanders WD. Case-only design to measure gene-gene interaction. Epidemiology 1999;10:167-70.
- Albert PS. Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol 2001;154:687-93.
- Garte S, Gaspari L, Alexandrie AK, et al. Metabolic gene polymorphism frequencies in control populations. Cancer Epidemiol Bioma Prev 2001;10:1239-48.
- Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001;69:138-47.
Address correspondence to Dr. Khoury at
Office of Genomics and Disease Prevention
Centers for Disease Control and Prevention
6 Executive Park, Mail Stop E-82
Atlanta, Georgia 30329