CDC Home

# Appendix A: Technical Notes

How to Interpret a Confidence Interval | Findings from Validation Visits for 2009 ART Data | Discrepancy Rates by Data Fields Selected for Validation

### How to Interpret a Confidence Interval

What is a confidence interval?
Simply speaking, confidence intervals are a useful way to consider margin of error, a statistic often used in voter polls to indicate the range within which a value is likely to be correct (e.g., 30% of the voters favor a particular candidate with a margin of error of plus or minus 3.5%). Similarly, in this report, confidence intervals are used to provide a range that we can be quite confident contains the success rate for a particular clinic during a particular time.

Why do we need to consider confidence intervals if we already know the exact success rates for each clinic in 2009?
No success rate or statistic is absolute. Suppose a clinic performed 100 cycles among women younger than 35 in 2009 and had a success rate of 20% with a confidence interval of 12%–28%. The 20% success rate tells us that the average chance of success for women younger than 35 treated at this clinic in 2009 was 20%. How likely is it that the clinic could repeat this performance? For example, if the same clinic performed another 100 cycles under similar clinical conditions on women with similar characteristics, would the success rate again be 20%? The confidence interval tells us that the success rate would likely fall between 12% and 28%.

Why does the size of the confidence interval vary for different clinics?
The size of the confidence interval gives us a realistic sense of how secure we feel about the success rate. If the clinic had performed only 20 cycles instead of 100 among women younger than age 35 and still had a 20% success rate (4 successes out of 20 cycles), the confidence interval would be much larger (between 3% and 37%) because the success or failure of each individual cycle would be more significant. For example, if just one more cycle had resulted in a live birth, the success rate would have been substantially higher—25%, or 5 successes out of 20 cycles. Likewise, if just one more cycle had not been successful, the success rate would have been substantially lower—15%, or 3 out of 20 cycles. Compare this scenario to the original example of the clinic that performed 100 cycles and had a 20% success rate. If just one more cycle had resulted in a live birth, the success rate would have changed only slightly, from 20% to 21%, and if one more cycle had not been successful, the success rate would have fallen to only 19%. Thus, our confidence in a 20% success rate depends on how many cycles were performed.

Why should confidence intervals be considered when success rates from different clinics are being compared?
Confidence intervals should be considered because success rates can be misleading. For example, if Clinic A performs 20 cycles in a year and 8 cycles result in a live birth, its live birth rate would be 40%. If Clinic B performs 600 cycles and 180 result in a live birth, the percentage of cycles that resulted in a live birth would be 30%. We might be tempted to say that Clinic A has a better success rate than Clinic B. However, because Clinic A performed few cycles, its success rate would have a wide 95% confidence interval of 18.5%–61.5%. On the other hand, because Clinic B performed a large number of cycles, its success rate would have a relatively narrow confidence interval of 26.2%–33.8%. Thus, Clinic A could have a rate as low as 18.5% and Clinic B could have a rate as high as 33.8% if each clinic repeated its treatment with similar patients under similar clinical conditions. Moreover, Clinic B’s rate is much more likely to be reliable because the size of its confidence interval is much smaller than Clinic A’s.

Even though one clinic’s success rate may appear higher than another’s based on the confidence intervals, these confidence intervals are only one indication that the success rate may be better. Other factors also must be considered when comparing rates from two clinics. For example, some clinics see more than the average number of patients with difficult infertility problems, whereas others discourage patients with a low probability of success. For more information see important factors to consider when using the tables to assess a clinic.

### Findings from Validation Visits for 2009 ART Data

Site visits to ART clinics for validation of 2009 ART data were conducted during April through June 2011. This year, 35 of the 441 reporting clinics were randomly selected after taking into consideration the number of ART procedures performed at each clinic and whether the clinic had been selected before. During each visit, ART data reported by the clinic to CDC were compared with information documented in medical records.

For each clinic, the validated sample included up to 50 ART cycles resulting in pregnancy and up to 75 additional cycles depending on the number and type of ART procedures performed at each clinic. In total, 2,573 ART cycles performed in 2009 across the 35 clinics were randomly selected for full validation, along with 268 embryo banking cycles. The full validation included review of 1,676 cycles for which a pregnancy was reported and that resulted in 1,396 live-birth deliveries. Of the 1,396 live-birth deliveries, 398 were multiple-infant births.

In addition, among patients whose cycles were validated, we verified the number of reported cycles. For each of these patients, we compared the total number of ART cycles reported with the total number of cycles included in the medical record. The discrepancy rate for the new data field “Additional cycles in same reporting year” was calculated.

Discrepancy rates are listed on the next page for validated items of interest. Overall, validation of 2009 ART cycle data indicated that discrepancy rates were low (<5.0%), except for “Diagnosis of infertility”—this field corresponds to “Patient Diagnosis” data in the 2009 individual clinic tables and national summary table in this report.

### Discrepancy Rates by Data Fields Selected for Validation

Data Field Name Discrepancy Rate* (Confidence Interval†) Comments
Patient date of birth 1.5% (0.8–2.2) In 75% of the discrepancies, the difference did not result in changing age category (Age of Woman).
Diagnosis of infertility 15.5% (10.1–20.9) For approximately 40% of the discrepancies, a single wrong diagnosis was reported, mainly “Other” or “Unexplained,” instead of a specific cause. For another 40% of the discrepancies, multiple causes of infertility were found in the medical record, but only a single cause was reported.
Number of embryos/ oocytes transferred <1%
Number of embryos cryopreserved 3.3% (1.7–5.0) Approximately 20% of the discrepancies were the result of incorrectly reporting that zero (0) embryos were cryopreserved when one or more embryos were actually cryopreserved.
Outcome of ART treatment (i.e., pregnant vs. not pregnant) 2.4% (0.7–4.1) No information on the outcome of ART treatment was found in the medical records for approximately 40% of the discrepancies.
Number of fetal hearts on ultrasound 2.9% (1.6–4.2) Of the discrepancies, 20% were misreported as single-fetus pregnancies instead of multiple-fetus pregnancies, whereas 15% of the discrepancies were misreported as having one or more fetal hearts when the medical records actually showed zero (0) fetal hearts.
Pregnancy outcome (i.e., miscarriage, stillbirth, and live birth) 2.0% (1.0–3.0) For about half of the discrepancies, there was no information on pregnancy outcome in the medical records.
Number of infants born <1%
Cycle cancellation <1%
Additional cycles in same reporting year 3.5% (1.5–5.4) For approximately 80% of the discrepancies, fewer additional cycles were reported by clinics than were found in the medical records. The majority of the discrepancies were due to reporting one less cycle.

Note: ART = assisted reproductive technology.

* Discrepancy rates estimate the proportion of all treatment cycles with differences for a particular data item. The discrepancy-rate calculations weight the data from validated cycles to reflect the overall number of cycles performed at each clinic. Thus, findings from larger clinical practices were weighted more heavily than those from smaller practices.

† This table shows a range, called the 95% confidence interval, that conveys the reliability of the discrepancy rate. For more information, see explanation of confidence intervals.