# Data Validation

Meetings with assisted reproductive technology (ART) clinics for validation of ART data were conducted from April through June 2022.

For data validation, 35 of the 449 reporting clinics were randomly selected after taking into consideration the number of ART cycles performed at each clinic, some cycle and clinic characteristics, and whether the clinic had been selected before. During each validation meeting, ART data reported by the clinic to CDC were compared with information documented in medical records.

For each clinic, the fully validated sample included up to 40 cycles resulting in pregnancy and up to 20 cycles not resulting in pregnancy. Up to 10 cycles using donor eggs or embryos were included among the fully validated sample at each clinic. In addition, among patients whose cycles were fully validated, the number of ART cycles performed during the year was verified. For each of these patients, the total number of cycles reported was compared with the total number of cycles in the medical record. If unreported ART cycles were identified in selected medical records, up to 10 of these cycles were also selected for partial validation.

To assess discrepancy rates, 2,070 ART cycles across the 35 randomly selected clinics were randomly selected for full validation, along with 294 fertility preservation banking cycles selected for partial validation. Validation results from non-randomly selected clinics (targeted validation) were not included in the calculation of discrepancy rates since they cannot be generalized to all reporting clinics. Discrepancy rates for the validated items of interest are presented in a table later in this section.

How to Interpret Confidence Intervals for Discrepancy Rates

What is a confidence interval?

Simply speaking, confidence intervals are a useful way to consider the margin of error, which is a statistic often used in voter polls to indicate the range within which a value is likely to be correct. For example, 30% of voters favor a particular candidate, with a margin of error of plus or minus 3.5%.

Why do we need to consider confidence intervals if we already know the exact discrepancy rates for each clinic?

No discrepancy rate or statistic is absolute. Suppose that, during validation, a sample of 100 cycles was reviewed, and a discrepancy rate of 15% was determined for a particular data item, with a 95% confidence interval of 10% to 20%. The 15% discrepancy rate tells us that we estimate the average chance that a discrepancy occurred for the selected data field among all reported cycles to be 15%, based on the results of our sample of 100 cycles. However, that estimated discrepancy rate may not match the true discrepancy rate that we would calculate if we were to validate every single cycle during a reporting year.

The 95% confidence interval tells us that we are 95% confident that the true discrepancy rate is 10% to 20%. In other words, if we were to repeat the process of selecting a sample of 100 cycles many times, calculating the discrepancy rate and 95% confidence interval for each sample, we would expect 95% of the calculated confidence intervals to capture the true discrepancy rate.

Data Field Name

Discrepancy Ratea
(Confidence Intervalb)

Comments
discrepancies
Patient date of birth

0.3%
(0.1,1.0)

Cycle intention

0.6%
(0.2,1.8)

Cycle start date

3.8%
(1.4,10.1)

For 61% of discrepancies, the cycle start date reported by the clinic was later than the cycle start date found in medical records, and 77% of discrepancies were within 7 days of the reported date.
Date of egg retrieval

0.2%
(0.1,0.8)

Number of embryos transferred

0.2%
(0.1,0.6)

Outcome of ART treatment (pregnant or not pregnant)

0.1%
(0.0,0.2)

Pregnancy outcome (such as miscarriage, live-birth, or stillbirth)

0.4%
(0.1,1.4)

Date of pregnancy outcome

1.0%
(0.4,2.5)

For 68% of discrepancies, the date of pregnancy outcome was reported by the clinic but was not found in medical records, or vice versa. When the date of pregnancy outcome was found in both sources, about 58% of discrepancies were within 7 days of the reported date.
Number of infants born

0.3%
(0.1,1.5)

Cycle count

2.0%
(0.8,4.8)

For 70% of discrepancies, the number of cycles found in medical records was higher than the number of cycles reported by the clinic. For 91% of discrepancies, there was one cycle difference between the number of cycles reported by the clinic and the number of cycles found in medical records.
Patient Diagnosis—Reason for ART
Tubal factor

0.9%
(0.5,1.7)

Ovulatory dysfunction

2.4%
(1.3,4.2)

Ovulatory dysfunction was underreported. For 73% of discrepancies, ovulatory dysfunction was found in medical records, but was not reported by the clinic.
Diminished ovarian reserve

1.5%
(0.6,3.3)

Diminished ovarian reserve was overreported. For 55% of discrepancies, diminished ovarian reserve was  reported by the clinic, but was not found in medical records.
Endometriosis

0.4%
(0.1,1.0)

Uterine factor

0.6%
(0.2,1.9)

Male factor

1.3%
(0.6,3.1)

Male factor was underreported. For 67% of discrepancies, male factor was found in medical records, but was not reported by the clinic.
Other factor

4.9%
(2.5,9.7)

Other factor was underreported. For 65% of discrepancies, other factor was found in medical records, but was not reported by the clinic.
Unknown factor

1.0%
(0.4,3.1)

Unknown factor was overreported. For 87% of discrepancies, unknown factor was reported by the clinic, but was not found in medical records.

ART = assisted reproductive technology.

a Discrepancy rates estimate the proportion of all ART cycles with differences for a particular data item. Discrepancy rate calculations weight the data from validated cycles to reflect the overall number of cycles performed at each clinic. Thus, findings from larger clinics were weighted more heavily than those from smaller clinics.

b This table shows a range, called the 95% confidence interval, that conveys the reliability of the discrepancy rate. A general explanation of confidence intervals is provided earlier in this section.