Chapter 26 - Assessing the evidence for clinical utility in newborn screening

Human Genome Epidemiology (2nd ed.): Building the evidence for using genetic information to improve health and prevent disease

“The findings and conclusions in this book are those of the author(s) and do not
necessarily represent the views of the funding agency.”

These chapters were published with modifications by Oxford University PressExternal (2010)

Scott D. Grosse

The first newborn screening programs to use dried blood spots collected on filter paper cards and sent to biochemical screening laboratories to screen for phenylketonuria (PKU) began in the United States in 1962 (1). Most screening programs before the early 2000s screened for only a handful of disorders, chiefly PKU and congenital hypothyroidism (CH). In the United States, most screening panels at the time also included galactosemia and hemoglobinopathies (2), but these were rarely included in other countries (3). In recent years, programs in many countries have used tandem mass spectrometry (MS/MS) to screen for a number of rare metabolic disorders (2,4).

This chapter outlines key methodological issues in collecting and analyzing data on outcomes in individuals with genetic disorders that are candidates for inclusion in screening panels and reviews the relevant literature for two disorders that have relatively abundant evidence. One disorder is medium-chain acyl-CoA dehydrogenase deficiency (MCADD), which is a fatty acid oxidation disorder that is the most common of the new disorders detected by mass-throughput MS/MS technology (5). MCADD has been the “poster child” for expanded newborn screening. The other disorder is cystic fibrosis (CF), which is also increasingly being added to screening panels (6).

Sources of Evidence on Clinical Utility of Newborn Screening

The clinical utility of a screening test is commonly defined as the balance of benefits and harms in terms of health and psychosocial outcomes (7). Psychosocial issues include potential harms of false-positive or carrier screening results on anxiety, misunderstanding, and parent-child bonding (8,9). Leaving these issues aside, the effect of newborn screening on health status for a given disorder can be assessed by comparing outcomes observed among cohorts of individuals born with a given disorder, some of whom received newborn screening and some of whom did not. Potential sources of data are randomized trials and observational cohort studies, with most available data coming from observational data. For the latter, challenges lie in ensuring that both cases and outcomes are reliably ascertained, particularly in unscreened cohorts. These challenges, a number of which are reviewed in this section, are not unique to genetic disorders.

Randomized controlled trials (RCTs), which are the most reliable source of evidence for evaluating effectiveness (10), have been conducted for only one newborn screening test, cystic fibrosis, and those trials involved small numbers of cases (11,12). An RCT of screening for less common metabolic disorders would require millions of infants to be enrolled, which is not practical even if ethically tolerable (13). Also, the close monitoring of patients in RCT can lessen the external validity or generalizability of results to individuals receiving care in the community (14).

Observational data are challenging to analyze because there are multiple potential sources of bias that need to be considered. In particular, unscreened cohorts of individuals identified with a genetic disorder are not necessarily equivalent to screened cohorts because of underascertainment. Underascertainment can have two opposite effects. First, disorders that often go undiagnosed because certain individuals are either asymptomatic or have only mild signs and symptoms can result in relatively severely affected individuals being overrepresented among those who are clinically detected relative to the population of those with the disorder. If this is the case, the frequency of poor outcomes among those clinically detected with the disorder could be overstated. On the other hand, if affected individuals who have not been clinically diagnosed experience sudden death, perhaps during an infectious episode, the death might be attributed to an infectious agent or to unknown cause, thereby leading to an underestimation of the risk of death due to the disorder. One way to adjust for the first type of bias is to examine the differences in the distributions of genetic variants associated with differences in phenotypes between screened and unscreened cohorts. This presumes that genotype–phenotype associations are already established.

A potential source of unbiased estimates of outcomes in unscreened cohorts is the retrospective analysis of stored dried blood spot specimens collected by newborn screening programs that did not screen for the disorder of interest at the time the specimens were collected (15). Such specimens need to be linked to databases containing either information on outcomes or information permitting families to be contacted to obtain those data. This type of method is particularly valuable for investigating the frequency of sudden death in the absence of diagnosis for those disorders for which the analytes are sufficiently stable for retrospective testing to be reliably done. Also, screening randomly selected specimens for rare disorders, for example, prevalence of 1 in 20,000 births would require testing of a very large number of specimens in order to detect more than a handful of cases. Furthermore, certain disorders are difficult to reliably identify with simple tests, and confirmation by genotyping can be very expensive if feasible. If the primary endpoint for a disorder is mortality, a less expensive alternative is to link stored blood spot specimens with infant and child death records and test only those specimens for the disorder, along with a matched sample of control specimens. Ideally, specimens should be retrieved for all children who died in infancy or early childhood, since deaths caused by a disorder might be incorrectly attributed to infectious or other causes.

An excellent example of a retrospective screening study is one that analyzed 100,239 stored dried blood spot specimens collected in Sweden prior to the initiation of screening for CH and followed up with families to confirm diagnoses and assess outcomes (16). Alm et al. linked 32 specimens positive for CH by thyroid stimulating hormone (TSH) screening to children’s records, and 31 of these children could be tracked at 5 years of age. Medical records revealed that 15 of the children had been clinically diagnosed with CH, and an additional 7 children were found by the investigators to have undiagnosed hypothyroidism. The Griffiths Mental Development Scales were administered to 26 of the 31 children. Two of 14 (14%) children who had been clinically detected with CH had a developmental quotient (DQ), equivalent to IQ, of < 70, indicative of developmental delay and probable intellectual disability. No child with untreated CH was found to have low overall cognitive test scores, although statistically significant reductions on specific test scales were observed even among that group.

Many studies report improved outcomes in screened cohorts using historical cohorts as comparison groups, but it is often difficult to distinguish the effects of screening from that of improved treatments. The “natural history” of a disorder is the course of disease in the absence of treatment. Disease outcomes often improve over time as a result of changes in clinical awareness and diagnostic and therapeutic practices. Consequently, estimates based on historical cohorts born prior to the introduction of screening and effective treatment are likely to overstate the benefits of early identification (17,18). For example, U.S. adults for whom PKU was not detected at birth by newborn screening but who were put on a low phenylalanine diet beginning in the first several years of life mostly did not experience severe disability as adults, although they did experience some degree of disability (17,19). Comparisons from different geographic areas are likewise subject to bias if the availability of screening is correlated with the clinical awareness and management of a disorder (18).

Another common challenge to the identification of the effectiveness of screening is a lack of long-term follow-up for both screened and unscreened cohorts. Few long-term follow-up studies of screened cohorts have been reported, with the exceptions of PKU, CH, and CF. Cognitive testing is unreliable in infants and toddlers, and hence studies with follow-up of 1 or 2 years after birth (20) are difficult to interpret. The best-studied fatty acid oxidation disorder included in expanded screening panels, MCADD, has only had cognitive outcomes tracked up to 4 years of age (21).

Evidence on the Clinical Utility of Screening: Case Studies

Cystic Fibrosis

Cystic fibrosis is an autosomal recessive disorder affecting chiefly the lungs and the gastrointestinal tract that is caused by mutations in the CFTR gene (OMIM 602421). A single common mutation, ΔF508, accounts for two-thirds of all CF alleles worldwide (22). Approximately 15–20% of newborns with CF develop meconium ileus (MI), an intestinal obstruction present at birth that generally requires surgery to correct. The most common presenting symptoms among infants with CF but without MI are respiratory (recurrent cough, wheezing) and gastrointestinal, including loose stools and failure to thrive (23). Because symptoms are nonspecific, it is common for a diagnosis of CF not to be reached until after an infant is 12 months of age, and after multiple work-ups (24). Growth failure is secondary to maldigestion caused by insufficiency of pancreatic enzymes among most children with CF. As children age, growth retardation, chronic cough, lung infections, and decreased lung function become increasingly common. Spirometry is the method used to measure lung function, with the standard metric being forced expiratory volume in 1 second, or FEV1 as a percentage of predicted values based on height and age. However, FEV1 cannot be reliably measured in children less than 6 years of age, and it is not a sensitive measure of early-stage lung disease in children, reducing its utility as an outcome measure for evaluating CF newborn screening (25). Mortality in CF generally is associated with chronic obstructive pulmonary disease, with respiratory failure being the primary cause of death in more than 90% of people with CF (26).

A relative abundance of data exists to evaluate the clinical utility of newborn screening for CF, including two randomized trials in Wisconsin and England, four cohort studies with data on both screened and unscreened cohorts, and several analyses of two national patient registries in the United States and the United Kingdom (6,27). Summaries of findings from the two trials and two cohort studies follow, along with registry analyses. Because the focus is on data sources and methods of analysis to control for bias, results are presented study by study rather than by outcome. Findings relating to nutritional status and growth (12), which have consistently favored screened cohorts (6,27), are not discussed.

Randomized trials. The Wisconsin CF Neonatal Screening Project randomly assigned neonates born in Wisconsin during 1985–1994 to either a screened or control group (12). CF screening was performed for all children, but positive results were reported only to families in the screened group. Positive results were released to families in the control group if parents requested the results or when the child reached 4 years of age. Subjects with a diagnosis of CF were recruited into a protocol with follow-up every 6 weeks during the first year of life and every 3 months through 17 years of age. All children received care at one of two centers. Despite randomization, significantly more subjects with no ΔF508 allele (p <0.001) were in the control group (12). The Wisconsin study found significantly better growth status among those in the screened group (12), but no significant difference in either lung function (spirometry) or chest radiography, which is a more sensitive measure of lung disease (25). Endpoints not originally targeted but assessed in response to suggestions from experts (28) were health-related quality of life (no difference; 29) and cognitive ability (a significant difference among the subset of children with a vitamin E deficiency during infancy; 30). The Wisconsin study was not powered to evaluate mortality as an endpoint (28). Perhaps because of the close follow-up provided in the RCT, no deaths prior to 10 years of age were observed in either group among those without MI, unlike in the general pediatric population with CF (31).

In the United Kingdom, all neonates born in Wales and the West Midlands during 1985–1989 were randomly allocated to undergo or not undergo CF screening on an alternate-week basis (11). Because no screening was performed for those in the control group, an unknown number of undiagnosed cases of CF were not ascertained and no unbiased comparison of clinical outcomes could be undertaken; no differences in lung function were observed (11). Investigators subsequently reviewed registry and death certificate data to identify CF-related deaths among children in the unscreened group (32). No early deaths were reported among 78 children without MI in the screened group compared with four CF-related deaths before 5 years of age among 71 children without MI in the unscreened cohort (5.6 per 100) (p < .05). Two of the four deaths occurred among children who had received a clinical diagnosis of CF by 7 weeks of age based on the development of symptoms, and it was not clear whether the deaths would have been averted by screening (32).

Cohort studies. A historical cohort study in Australia compared 57 children with CF without MI who were born in New South Wales during the 3 years before July 1981, before screening was available, and 60 born during July 1981 to July 1984, when screening was available (33). All analyses were conducted on an intent-to-treat basis, with children included in the screened cohort if they were born while screening was offered, including three children not detected through screening. All subjects were followed at a single clinic. Significant differences in favor of the screened cohort were observed in hospitalizations during the first 2 years of life, in height at ages 1 and 5 years, in lung function at ages 5, 10, and 15 years, in chest radiographs at age 15 years, and in survival at age 10 years (33–35). The investigators acknowledged changes in treatment introduced during 1981–1983 could potentially have biased outcomes in favor of the screened cohort (34). However, no differences in outcomes were reported among children with CF and MI. Also, most of the differences in the Australian study have been confirmed by subsequent studies, with the exception of the lung function findings (6).

A concurrent geographical cohort study from northern France compared children with CF born during 1989–1998 in Brittany, which screened newborns for CF, with a comparison group of newborns in a neighboring region, Loire-Atlantique, which did not implement screening for CF and was said to have had comparable CF care (36). Standardized follow-up and therapeutic management was provided for patients in both regions who received a diagnosis of CF. Differential ascertainment did not appear to be a major problem, because the same birth prevalence of CF was observed in both areas. False-negative screening results (n = 5) were excluded by the investigators from the screened cohort, which is a potential source of bias and a weakness of the study design. Significant differences in favor of the screened cohort were reported for hospitalizations, height at ages 1, 3, and 5 years, chest radiographs and clinical scores, and mortality, although no differences were observed in lung function among the limited subset of individuals with spirometry measures (36). The investigators reported three CF-related deaths among 36 children without MI born in Loire-Atlantique (8.3 per 100), and no deaths among 77 children without MI born in Brittany (p < .05).

Patient registries. Three analyses (two published and one unpublished) of U.S. data from the Cystic Fibrosis Foundation National Patient Registry (CFFPR) reported evidence of improved lung function, although the first one in particular had biased case ascertainment. Wang et al. analyzed children at least 6 years of age in 1996 who were born during 1987–1990 and were diagnosed with CF by 36 months of age or the end of 1990, whichever came sooner (37). They classified CF cases without MI into four categories: early asymptomatic diagnosis (EAD), early symptomatic diagnosis (ESD), later asymptomatic diagnosis (LAD), and later symptomatic diagnosis (LSD), each on the basis of two dichotomous variables: age of diagnosis before or after 6 weeks of age and the presence of clinical signs and symptoms at the time of diagnosis. Asymptomatic diagnosis was defined as diagnosis by family history, genotype, prenatal diagnosis, or neonatal screening in the absence of clinical signs or symptoms recorded at the time of diagnosis. Children in the EAD group had significantly higher FEV1 scores. However, this finding was due to truncation of the late diagnosis groups. An unpublished analysis of data from the 2002 CFFPR by Grosse, Devine, and Rosenfeld found that the difference in mean lung function was attenuated when the 1990 diagnosis cutoff was removed and was eliminated when the arbitrary 36 month diagnostic cutoff was removed. In any case, the EAD group primarily consisted of infants diagnosed based on family history, and the majority of children detected based on newborn screening either had symptoms at diagnosis or were diagnosed after 6 weeks of age.

An analysis of data from the 2002 CFFPR by Accurso et al. compared lung function in relation to four types of diagnosis: newborn screening, symptomatic, MI, and prenatal diagnosis (24). The analysis excluded individuals diagnosed on the basis of a family history of CF, which was the leading source of asymptomatic diagnoses during the period. Children who had both newborn screening and symptoms checked were assigned to the newborn screening group (Marci Sontag, personal communication, February 13, 2008). Individuals at 6–10 and 11–20 years of age classified as diagnosed through newborn screening had significantly higher FEV1 scores than those in the symptomatic diagnosis group (24). Mean FEV1 for those with prenatal diagnoses did not differ from those with symptomatic or MI diagnoses, even though prenatal and newborn screening both enabled early detection and preventive care. The high FEV1 scores in the newborn screening group might have been due to unmeasured confounding or selection bias.

In an unpublished analysis of 2002 CFFPR data, Grosse, Devine, and Rosenfeld used the presence of newborn screening for CF in a state at the time of birth as a predictor variable in regression analysis on FEV1 scores among children 6–10 years of age. Children born in states with CF newborn screening programs had significantly better lung function, controlling for other predictors of lung function. Only a small number of states at the time screened for CF and it was not possible to determine whether states with better management of CF were more likely to have adopted screening, early treatment made possible by screening causally improved lung function, or both.

Three analyses of CFFPR data examined the association of newborn screening with survival. First, Lai et al. used the 2000 registry data to compare children or adults with a newborn or prenatal screening diagnosis recorded and those diagnosed with MI or with symptoms other than MI (23). They reported significantly longer survival among those in the screening group compared with those in both the MI and symptom groups. However, most of the difference in survival was estimated to have occurred after 20 years of age. This is unlikely, since neither newborn nor prenatal screening was available before the 1980s (6). The CFFPR contains records in which a newborn screening diagnosis was listed for children born in states without screening programs at the time or which occurred after 1 year of age. The same investigators subsequently published an analysis that restricted the analysis to individuals diagnosed after 1986 and to deaths occurring before 14 years of age (38). That analysis reported that the association with mortality remained but was of borderline statistical significance (p < .10).

Finally, Grosse et al. compared the cumulative risk of death to 10 years of age among children with CF who were born during 1987–1991 in states with or without CF statewide newborn screening programs (31). The former group consisted of Colorado, Wisconsin, and Wyoming; children born in three states with voluntary private screening programs with incomplete coverage (Connecticut, Montana, and Pennsylvania) were excluded from the analysis. The analysis found an absolute difference in risk of 1.7 per 100 (0.65 versus 2.35 per 100), with a rate ratio of 3.6 (p= .13). Although not statistically significant, the difference in risk was only slightly smaller than that reported in the individual-level analysis (38). The small number of children born in states with screening programs made the results difficult to interpret. It is possible that better quality of care provided in states with screening programs could have accounted for the lower mortality rates observed. On the other hand, there is a greater likelihood that children with CF born in states without screening could have died due to complications of the disorder, such as electrolyte imbalance under heat stress, without a diagnosis having been established or recorded. Only a retrospective screening study conducted using stored specimens from a cohort not screened for CF at birth could quantify such deaths.

The 2002 UK Cystic Fibrosis Database (UKCFD) has been analyzed in several publications to compare outcomes for children 1–9 years of age without MI who were identified either through newborn screening or manifestation of symptoms and who were diagnosed beginning in 1994 (39–41). The investigators reported that children in the newborn screening group did not differ in terms of FEV1 scores but differed significantly in chest radiography at 6 years of age. They focused on the finding that children in the screened group were less likely to have received intensive or long-term therapies, which was regarded as indicative of less lung disease and a lower need for aggressive treatment (39). The percentage of children homozygous for the ΔF508 mutation was similar for each group (about 50%), and results of analyses stratified by genotype to test for potential confounding were comparable with those of the overall analysis. The one exception was that pancreatic enzyme replacement therapy was significantly more common in the clinically detected group overall but not among ΔF508 homozygotes (39). At the time children were born, screening for CF was universal in Wales and Northern Ireland, limited in England, and not available in Scotland. An analysis restricted to observations from seven English CF centers that treated appreciable numbers of children in both groups generated findings comparable with those in the overall sample (39).

Finally, the UKCFD was used to assess potential harm from unnecessary treatment. It was found that those with CF diagnosed through newborn screening have not been prematurely introduced to aggressive therapies, including pancreatic enzyme replacement therapy prior to the emergence of pancreatic insufficiency (40,41). This question has not been studied in the United States. A concern expressed in the United States is that many individuals are identified with borderline or atypical CF and have an unknown prognosis with unknown benefit (or harm) of treatment (42).

The potential harms of screening for CF include a risk that infants with newly diagnosed CF might be exposed to other CF patients with established Pseudomonas aeruginosa lung infections and become infected themselves (6). This almost certainly happened in one of the two centers in the Wisconsin trial, causing infants diagnosed with CF who were treated in that center and born during the first part of the trial to develop serious, chronic lung infections at an earlier age (27). However, CF centers have since instituted safeguards against exposure of CF patients to other patients to minimize this risk, and there is no evidence that this harm has subsequently been repeated (24).

Medium-Chain Acyl-CoA Dehydrogenase Deficiency

Medium-chain acyl-coA dehydrogenase deficiency (MCADD) is an autosomal recessive mitochondrial fatty acid oxidation disorder that is caused by mutations on the ACADM or MCAD gene (OMIM 607008). Deficiency in the MCAD protein reduces the formation of ketone bodies in the liver that provide an alternative energy source during periods of prolonged fasting or increased energy demands. Consequently, clinical presentation is usually related to hypoglycemia brought about by fasting and increased metabolic stress and can result in encephalopathy or sudden death. Although most patients present during infancy or early childhood, acute crises can occur throughout life. Most studies reported high mortality (16–26%) and variable levels of neurological sequelae among survivors of an acute metabolic crisis (43,44). Clinical case series have reported permanent sequelae in up to one-fourth to one-third of survivors in symptomatic MCADD (45,46).

The highest quality outcomes data for MCADD that are currently available come from one retrospective screening study conducted in England (47) and from studies of outcomes among screened and unscreened cohorts born in different states in Australia and followed to 4 years of age using a standardized protocol (21,48). Ascertainment bias in unscreened cohorts is a major problem. Population-based surveillance data from Australia and Western Europe indicated that in the absence of screening 35–60% of cases of MCADD were detected based on clinical signs or family history (21,49–51). MCADD was more rarely diagnosed in the United States prior to screening (52). Furthermore, children detected clinically have been reported to be more likely to be homozygotes for the relatively severe common mutation than are those detected through screening programs (5,49). Consequently, extrapolation of the frequency of sequelae among individuals with MCADD diagnosed in unscreened cohorts to all individuals with MCADD detected by screening will almost inevitably overstate the potential benefits of screening (53).

At least 25% of children with MCADD appear to remain asymptomatic and a similar percentage of affected children are likely to display relatively limited clinical signs and symptoms (44,47). Conversely, in the absence of screening an important percentage of children with MCADD experience fatal decompensation episodes that are likely to go undiagnosed, approximately 5–6% according to a retrospective analysis of stored blood spot specimens for unexplained child deaths in Virginia (54). Consequently, a count of child deaths attributed to MCADD in an unscreened population could understate the actual number of deaths caused by the disorder. In addition, sudden deaths among adults with MCADD can occur.

One retrospective MCADD screening study assessed outcomes for a random sample of stored specimens. Pourfarzam et al. analyzed 100,600 stored dried blood spot specimens collected from infants born in the northern United Kingdom during 1991–1993 and found that 14 screened positive for MCADD (47). They followed up with an examination of medical records and family surveys for all 14 children, including 12 children who were still alive at 7–9 years of age. They identified eight children as having MCADD, including three who had been clinically diagnosed prior to the study. One of the latter three had died at 17 months of age and was diagnosed post mortem. Three of the seven survivors with MCADD had experienced episodes of encephalopathy, two had had milder symptoms, and two had no symptoms recorded, although one of the latter had learning difficulties that might have been unrelated to the metabolic disorder. None of the survivors had a developmental disability that could be linked to a metabolic decompensation crisis. The number of observations was too restricted to provide precise estimates of the frequency of death or sequelae.

The frequency with which children with biochemical MCADD develop serious, life-threatening symptoms can also be assessed by using information on the older siblings of probands detected as newborns through screening. For example, Waisbren et al., through testing family members of 20 infants detected through screening, identified seven older surviving siblings with MCADD, of whom four had shown symptoms (hypoglycemia and extreme lethargy) of MCADD (the other three remained asymptomatic) (20). That study did not consider older siblings who might have died of MCADD. In another study, Pollitt and Leonard reported that four of six older siblings confirmed to have MCADD had experienced symptoms, and that an additional four siblings died with symptoms compatible with MCADD, although it could not be confirmed that they had MCADD (44).

The most informative study to date on MCADD outcomes utilized contemporaneous population-based screened and unscreened cohorts with MCADD born in Australia during 1994–2002 (21). A unique feature of the study was the reportedly complete ascertainment of all diagnosed cases of MCADD in Australia including those states that did not screen for the disorder at the time. Wilcken et al. analyzed the frequency of death and developmental delay among children followed to 4 years of age. They reported deaths among 6 (17%) of 35 children with the disorder diagnosed through clinical presentation or after diagnosis of a sibling, compared with 1 (4%) of 24 in those diagnosed through screening, as noted in an accompanying commentary (53). The latter death occurred in the first 3 days of life, before laboratory screening was done (55).

The death rate among the approximately 50% of the Australian MCADD unscreened cohort who were not diagnosed with the disorder based on clinical manifestations was probably lower than among those who did come to clinical attention. Wilcken et al. proposed that the death rate was perhaps only half as high in that group, which implies an overall unscreened cohort death rate of 12% (21). The 12% estimate can be compared with a death rate of 25% that is often cited on the basis of clinical case series but was the same as that reported in a retrospective study of stored blood spot specimens (47). Surprisingly, a subsequent publication from the same group that compared health care utilization for the screened and unscreened cohorts did not make an adjustment for ascertainment bias among the unscreened cohort (48).

A recent study of 137 Dutch individuals identified with MCADD from the late 1970s to 2003 based on clinical symptoms or family history, including individuals diagnosed post mortem, found a 20% death rate (56). However, mortality was lower, 15%, when restricted to the 110 probands, and no deaths were observed among 18 individuals detected in the newborn period through testing prompted by family history. The investigators made no adjustment for asymptomatic individuals with MCADD who were not included in their observations.

Although the risk of mortality is reduced through newborn screening for MCADD, it is not eliminated (21,55). First, infants with MCADD can die during the first 3 days after birth, before screening results can be reported (55,57). Second, reports from the United States and Germany discussed children diagnosed with MCADD who died despite receiving treatment (58,59). An analysis on the first 46 children diagnosed with MCADD through screening of 713,552 infants in four New England states identified two deaths (4%) at 11 and 33 months of age that were attributed to MCADD (59). The California newborn screening program reported two deaths in screened infants with MCADD that occurred in the first week after birth (Fred Lorey, personal communication, February 21, 2008). A retrospective screening study would be required to reliably ascertain the risk of death in an unscreened cohort.

Another endpoint in MCADD is disability among survivors of metabolic decompensation crises. However, clinical case reports are likely to be affected by ascertainment bias due to more severely affected children being more likely to be referred to specialized centers. Another problem is a lack of standardization in developmental assessments. In particular, there is a tendency for case series to cite poorly defined neurological sequelae and to fail to report the number of children affected rather than the number of symptoms. Sequelae resulting in intellectual disability typically occurred in 5–6% of all children with MCADD, with milder sequelae affecting perhaps a similar number of additional children (5). Those estimates take into account the probability that one-quarter to one-half of children with MCADD do not experience a metabolic crisis during childhood that would put them at risk of neurological disability.

Two recent publications reported on neurological sequelae among unscreened individuals with MCADD. The Australian study found no developmental delay in either screened or unscreened children at least 4 years of age who were administered cognitive assessments (21,48). The finding that unscreened children did not have serious problems was more favorable than had been previously reported for unscreened children with MCADD, not only in Australia but in other countries as well (5,53). This finding reflected improved clinical awareness of MCADD in Australia in recent years (21).

Van der Hilst et al. reported that five (4%) of 116 Dutch patients with MCADD born during 1985–2003 had been institutionalized, three of whom required permanent institutional care (60). This is slightly lower than the 6% frequency of severe disability reported by the same investigators for a sample of 155 patients that presumably included 39 born prior to 1985 (56). The latter sample included 18 subjects identified neonatally through family history, one (5%) of whom had a mild neurological impairment. No information was presented as to the ages of individuals who were classified as having severe disability, the criteria that were used, or cognitive assessments.

One potential harm from screening for MCADD is unnecessary treatment for children who would have remained asymptomatic or without sequelae. The Australian study cited previously reported that children in the screened cohort were less likely to have been hospitalized than were those in the unscreened cohort, 42% versus 71%, respectively (48). However, compared with the frequency of MCADD in states with screening, probably only 60% of children in the unscreened cohort were diagnosed. If those who did not come to clinical attention were not hospitalized, the rates of hospitalization among the screened and unscreened cohorts would have been approximately equal. Consequently, these data do not provide evidence of reduced rates of hospitalization with screening. At least these data do suggest that screening does not cause excess hospitalizations in screened children.

Lessons Learned

Evaluating the cumulative evidence of clinical utility from multiple epidemiologic studies is even more challenging than interpreting the results of a single study. Ioannidis et al. have proposed three types of criteria: amount of evidence in terms of total numbers of observations, consistency in findings among studies, and study quality in terms of protection from bias (61). In addition, it seems reasonable to take into account effect size. Other things constant, a larger effect size in terms of proportional improvement in outcomes is associated with greater clinical utility.

The cumulative evidence of clinical utility from newborn screening is uneven. Among the two disorders reviewed here, the greatest amount and quality of evidence exists for CF, with two randomized trials of screening, several cohort studies, and analysis of two national patient registries. However, consistency of findings in CF studies of screening is variable. The most consistent evidence among study findings is for growth and the weakest evidence is for lung function. Most studies, including one randomized trial, have found significant differences in mortality, but the highest quality trial did not.

For MCADD, one small retrospective screening study and two pilot screening studies with long-term follow-up are available. In addition to the Australian study, a large-scale MCADD screening study in the United Kingdom has collected outcomes data that will be reported at a later date. There has been a lack of consistency of findings in terms of both mortality and disability. Although studies have consistently reported deaths among at least 10% of children with MCADD in the absence of screening, this percentage is variable and subject to ascertainment bias in both directions. Also, there are persistent reports of deaths from MCADD even with screening, occurring among as many as 4% of children born with MCADD (55,59), but the numbers involved are very small. Finally, to the extent that cognitive impairment in MCADD can be prevented without screening, as suggested by the Australian study, the number of cases of disability prevented by screening is context specific.

The MCADD case study illustrates the challenge of evaluating rare disorders; most other disorders being added to newborn screening panels are even less common. Because of the rarity of MCADD, about 1 in 15,000 births, comprehensive follow-up data on millions of children screened for MCADD are needed in order to generate reliable data on outcomes of screening (53). The Australian data suggest that perhaps 70% of child deaths from MCADD are prevented by newborn screening (21). Although this is less than 100%, it is still important evidence of the clinical utility of screening for MCADD. In the absence of pooling of long-term follow-up data from multiple screening programs utilizing a standard protocol (50), all assessments of clinical utility of screening must remain tentative.

A long lead time is needed before a fully evidence-based decision about the clinical utility of screening can be reached. The first statewide screening programs for CF began in 1981 in Australia and 1982 in the United States. The first statewide screening programs for MCADD began in 1997 in the United States and in 1998 in Australia. Two U.K. health technology assessments published in 1997 concluded that screening for MCADD met all or almost all recognized criteria for screening programs (62,63). A subsequent U.K. report (64) confirmed and expanded on the first assessment (62). However, it was only in 2007 that the National Health Service decided to adopt universal screening for MCADD in England, after preliminary results from a pilot screening study confirmed findings from other countries. In the United States, Massachusetts adopted universal screening for MCADD in 1998 (65,66), based in large part on one of the U.K. reviews (62). Numerous other states followed subsequently (67).

Given what is known now, the early adopters of screening for CF and MCADD appear to have been justified in their decisions. There is a societal cost of delaying the initiation of screening tests that can save lives and prevent disability, which needs to be balanced against the cost of deciding to screen for disorders that might eventually be shown to not provide clear benefit. Large-scale pilot screening programs with rigorous evaluation protocols are essential to contribute to the evidence base. In addition, policy makers should be prepared to discontinue screening tests for which evidence of utility is ultimately lacking.


I thank Ingeborg Blancquaert, Anne Comeau, Philip Farrell, Alex Kemper, Martin Kharrazi, Fred Lorey, Lisa Prosser, Marci Sontag, Esther Sumartojo, John Thompson, and Bridget Wilcken for their helpful comments.

 Top of Page


  1. MacCready R. Phenylketonuria screening program. N Engl J Med. 1963;269:52–56.
  2. Therrell BL, Adams J. Newborn screening in North America. J Inherit Metab Dis. 2007;30:447–465.
  3. Loeber JG. Neonatal screening in Europe; the situation in 2004. J Inherit Metab Dis. 2007;30:430–438.
  4. Bodamer OA, Hoffmann GF, Lindner M. Expanded newborn screening in Europe 2007. J Inherit Metab Dis. 2007;30:439–444.
  5. Grosse SD, Khoury MJ, Greene C, Crider KS, Pollitt RJ. The epidemiology of medium chain acyl-coA dehydrogenase deficiency (MCADD): an update. Genet Med. 2006;8:205–212.
  6. Grosse SD, Boyle CA, Botkin JR, et al. Newborn screening for cystic fibrosis: evaluation of benefits and risks and recommendations for state newborn screening programs. MMWR Recomm Rep. 2004;53(RR–13):1–36.
  7. Grosse SD, Khoury MJ. What is the clinical utility of genetic testing? Genet Med. 2006;8:448–450.
  8. Hewlett J, Waisbren SE. A review of the psychosocial effects of false-positive results on parents and current communication practices in newborn screening. J Inherit Metab Dis. 2006;29:677–682.
  9. Green JM, Hewison J, Bekker HL, Bryant LD, Cuckle HS. Psychosocial aspects of genetic screening of pregnant women and newborns: a systematic review. Health Technol Assess. 2004;8(33):1–109.
  10. Dezateux C. Newborn screening for medium chain acyl-CoA dehydrogenase deficiency: evaluating the effects on outcome. Eur J Pediatr. 2003;162:S25–S28.
  11. Chatfield S, Owen G, Ryley HC, et al. Neonatal screening for cystic fibrosis in Wales and the West Midlands: clinical assessment after five years of screening. Arch Dis Child. 1991;66:29–33.
  12. Farrell PM, Kosorok MR, Rock MJ, et al. Early diagnosis of cystic fibrosis through neonatal screening prevents severe malnutrition and improves long-term growth. Pediatrics. 2001;107:1–13.
  13. Wilcken B. Ethical issues in newborn screening and the impact of new technologies. Eur J Pediatr. 2003;162:S62–S66.
  14. Weiss NS, Koepsell TD, Psaty BM. Generalizability of the results of randomized trials. Arch Intern Med. 2008;168:133–135.
  15. Nørgaard-Pedersen B, Simonsen H. Biological specimen banks in neonatal screening. Acta Paediatr Suppl. 1999;88(432):106–109.
  16. Alm J, Hagenfeldt L, Larsson A, Lundberg K. Incidence of congenital hypothyroidism: retrospective study of neonatal laboratory screening versus clinical symptoms as indicators leading to diagnosis. BMJ. 1984;289:1171–1175.
  17. Grosse SD. Late-treated phenylketonuria and partial reversibility of intellectual disability. Child Develop. In press.
  18. Castellani C. Evidence for newborn screening for cystic fibrosis. Paediatr Respir Rev. 2003;4:278–284.
  19. Koch R, Moseley K, Ning J, Romstad A, Guldberg P, Guttler F. Long-term beneficial effects of the phenylalanine-restricted diet in late-diagnosed individuals with phenyl. Mol Genet Metab. 1999;67:148–155.
  20. Waisbren SE, Albers S, Amato S, et al. Effect of expanded newborn screening for biochemical genetic disorders on child outcomes and parental stress. JAMA. 2003;290:2564–2572.
  21. Wilcken B, Haas M, Joy P, et al. Outcome of neonatal screening for medium-chain acyl-CoA dehydrogenase deficiency in Australia: a cohort study. Lancet. 2007;369:37–42.
  22. Bobadilla JL, Macek M, Jr, Fine JP, Farrell PM. Cystic fibrosis: a worldwide analysis of CFTR mutations—correlation with incidence data and application to screening. Hum Mutat. 2002;19:575–606.
  23. Lai HJ, Cheng Y, Cho H, Kosorok MR, Farrell PM. Association between initial disease presentation, lung disease outcomes, and survival in patients with cystic fibrosis. Am J Epidemiol. 2004;159:537–546.
  24. Accurso FJ, Sontag MK, Wagener JS. Complications associated with symptomatic diagnosis in infants with cystic fibrosis. J Pediatr. 2005;147:S37–S41.
  25. Farrell PM, Li Z, Kosorok MR, et al. Longitudinal evaluation of bronchopulmonary disease in children with cystic fibrosis. Pediatr Pulmonol. 2003;36:230–240.
  26. Welsh MJ, Ramsey BW, Accurso F, Cutting GR. Cystic fibrosis. In: Scriver CR, et al., editors. The Metabolic and Molecular Basis of Inherited Disease, 8th ed. New York: McGraw-Hill; 2001:5121–5188.
  27. McKay KO. Cystic fibrosis: benefits and clinical outcome. J Inherit Metab Dis. 2007;30:544–555.
  28. Centers for Disease Control and Prevention. Newborn screening for cystic fibrosis: a paradigm for public health genetics policy development: proceedings of a 1997 workshop. MMWR. 1997;46(No. RR–16):1–24.
  29. Koscik RL, Douglas JA, Zaremba K, et al. Quality of life of children with cystic fibrosis. J Pediatr. 2005;147:S64–S68.
  30. Koscik RL, Farrell PM, Kosorok MR, et al. Cognitive function of children with cystic fibrosis: deleterious effect of early malnutrition. Pediatrics. 2004;113:1549–1558.
  31. Grosse SD, Rosenfeld M, Devine OJ, Lai HJ, Farrell PM. Potential impact of newborn screening for cystic fibrosis on child survival: a systematic review and analysis. J Pediatr. 2006;149:362–366.
  32. Doull IJ, Ryley HC, Weller P, Goodchild MC. Cystic fibrosis-related deaths in infancy and the effect of newborn screening. Pediatr Pulmonol. 2001;31:363–366.
  33. Wilcken B, Chalmers G. Reduced morbidity in patients with cystic fibrosis detected by neonatal screening. Lancet. 1985;2:1319–1321.
  34. Waters DL, Wilcken B, Irwing L, et al. Clinical outcomes of newborn screening for cystic fibrosis. Arch Dis Child Fetal Neonatal Ed. 1999;80:F1–F7.
  35. McKay KO, Waters DL, Gaskin KJ. The influence of newborn screening for cystic fibrosis on pulmonary outcomes in new South Wales. J Pediatr. 2005;147:S47–S50.
  36. Siret D, Bretaudeau G, Branger B, et al. Comparing the clinical evolution of cystic fibrosis screened neonatally to that of cystic fibrosis diagnosed from clinical symptoms: a 10-year retrospective study in a French region (Brittany). Pediatr Pulmonol. 2003;35:342–349.
  37. Wang SS, O’Leary LA, Fitzsimmons SC, Khoury MJ. The impact of early cystic fibrosis diagnosis on pulmonary function in children. J Pediatr. 2002;141:804–810.
  38. Lai HJ, Cheng Y, Farrell PM. The survival advantage of patients with cystic fibrosis diagnosed through neonatal screening: evidence from the United States Cystic Fibrosis Foundation registry data. J Pediatr. 2005;147:S57–S63.
  39. Sims EJ, McCormick J, Mehta G, Mehta A. Newborn screening for cystic fibrosis is associated with reduced treatment intensity. J Pediatr. 2005;147:306–311.
  40. Sims EJ, McCormick J, Mehta G, Mehta A. Neonatal screening for cystic fibrosis is beneficial even in the context of modern treatment. J Pediatr. 2005;147:S42–S46.
  41. Sims EJ, Clark A, McCormick J, et al. Cystic fibrosis diagnosed after 2 months of age leads to worse outcomes and requires more therapy. Pediatrics. 2007;119:19–28.
  42. Parad RB, Comeau AM. Diagnostic dilemmas resulting from the immunoreactive trypsinogen/DNA cystic fibrosis newborn screening algorithm. J Pediatr. 2005;147:S78–S82.
  43. Wilcken B, Hammond J, Silink M. Morbidity and mortality in medium-chain acyl coenzyme A dehydrogenase deficiency. Arch Dis Child. 1994;70:410–412.
  44. Pollitt RJ, Leonard JV. Prospective surveillance study of medium-chain acyl-CoA dehydrogenase deficiency in the UK. Arch Dis Child. 1998;79;116–119.
  45. Iafolla AK, Thompson RJ, Jr, Roe CR. Medium-chain acyl-coenzyme A dehydrogenase deficiency: clinical course in 120 affected children. J Pediatr. 1994;124:409–415.
  46. Venditti LN, Venditti CP, Berry GT, et al. Newborn screening by tandem mass spectrometry for medium-chain Acyl-CoA dehydrogenase deficiency: a cost-effectiveness analysis. Pediatrics. 2003;112:1005–1015.
  47. Pourfarzam M, Morris A, Appleton M, et al. Neonatal screening for medium-chain acyl-CoA dehydrogenase deficiency. Lancet. 2001;358:1063–1064.
  48. Haas M, Chaplin M, Joy P, Wiley V, Black C, Wilcken B. Healthcare use and costs of medium-chain acyl-CoA dehydrogenase deficiency in Australia: screening versus no screening. J Pediatr. 2007;151:121–126.
  49. Carpenter K, Wiley V, Sim KG, et al. Evaluation of newborn screening for medium chain acyl-CoA dehydrogenase deficiency in 275 000 babies. Arch Dis Child Fetal Neonatal Ed. 2001;85:F105–F109.
  50. Liebl B, Nennstiel-Ratzel U, Roscher A, von Kries R. Data required for the evaluation of newborn screening programmes. Eur J Pediatr. 2003;162:S57–S61.
  51. Hoffmann GF, von Kries R, Klose D, et al. Frequencies of inherited organic acidurias and disorders of mitochondrial fatty acid transport and oxidation in Germany. Eur J Pediatr. 2004;163:76–80.
  52. Schoen EJ, Baker JC, Colby CJ, To TT. Cost-benefit analysis of universal tandem mass spectrometry for newborn screening. Pediatrics. 2002;110:781–786.
  53. Grosse SD, Dezateux C. Newborn screening for inherited metabolic disease. Lancet. 2007;369:5–6.
  54. Dott M, Chace D, Fierro M, et al. Metabolic disorders detectable by tandem mass spectrometry and unexpected early childhood mortality: a population-based study. Am J Med Genet A. 2006;140:837–842.
  55. Wilcken B. Medium-chain acyl-coenzyme A dehydrogenase deficiency in a neonate. N Engl J Med. 2008;358:647.
  56. Derks TG, Reijngoud DJ, Waterham HR, et al. The natural history of medium-chain acyl CoA dehydrogenase deficiency in the Netherlands: clinical presentation and outcome. J Pediatr. 2006;148:665–670.
  57. Cyriac J, Venkatesh V, Gupta C. A fatal neonatal presentation of medium-chain acyl coenzyme A dehydrogenase deficiency. J Int Med Res. 2008;36:609–610.
  58. Nennstiel-Ratzel U, Arenz S, Maier EM, et al. Reduced incidence of severe metabolic crisis or death in children with medium chain acyl-CoA dehydrogenase deficiency homozygous for c. 985A G identified by neonatal screening. Mol Genet Metab. 2005;85:157–159.
  59. Hsu HW, Zytkovicz TH, Comeau AM, et al. Spectrum of medium chain acyl-coA dehydrogenase (MCAD) deficiency detected by newborn screening. Pediatrics. 2008;121:e1108–e1114.
  60. van der Hilst CS, Derks TG, Reijngoud DJ, Smit GP, TenVergert EM. Cost-effectiveness of neonatal screening for medium chain acyl-CoA dehydrogenase deficiency: the homogeneous population of The Netherlands. J Pediatr. 2007;151:115–120.
  61. Ioannidis JP, Boffetta P, Little J, et al. Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol. 2008;37:120–132.
  62. Pollitt RJ, Green A, McCabe CJ, et al. Newborn screening for inborn errors of metabolism: a systematic review. Health Technol Assess. 1997;1(7):1–202.
  63. Seymour CA, Thomason MJ, Chalmers RA, et al. Newborn screening for inborn errors of metabolism: a systematic review. Health Technol Assess. 1997;1(11):1–95.
  64. Pandor A, Eastham J, Beverley C, et al. Clinical effectiveness and cost-effectiveness of neonatal screening for inborn errors of metabolism using tandem mass spectrometry: a systematic review. Health Technol Assess. 2004;8(12):1–121.
  65. Atkinson K, Zuckerman B, Sharfstein JM, et al. A public health response to emerging technology: expansion of the Massachusetts newborn screening program. Public Health Rep. 2001;116:122–131.
  66. Grosse S, Gwinn M. Assisting states in assessing newborn screening options. Public Health Rep. 2001;116:169–172.
  67. Therrell BL, Adams J. Newborn screening in North America. J Inherit Metab Dis. 2007;30:447–465.

 Top of Page