# Data Validation

Meetings with assisted reproductive technology (ART) clinics for validation of ART data were conducted from June through July 2023.

For data validation, 35 of the 453 reporting clinics were randomly selected after taking into account the number of ART cycles performed at each clinic, some cycle and clinic characteristics, and whether the clinic had been selected before. During each validation meeting, ART data reported by the clinic to CDC were compared with information documented in medical records.

For each clinic, the fully validated sample included up to 40 cycles resulting in pregnancy and up to 20 cycles not resulting in pregnancy. Up to 10 cycles using donor eggs or embryos were included among the fully validated sample at each clinic. In addition, among patients whose cycles were fully validated, the number of ART cycles performed during the year was verified. For each of these patients, the total number of cycles reported was compared with the total number of cycles in the medical record. If unreported ART cycles were identified in selected medical records, up to 10 of these cycles were also selected for partial validation.

To assess discrepancy rates, 2,043 ART cycles across the 35 randomly selected clinics were randomly selected for full validation, along with 290 fertility preservation banking cycles selected for partial validation. Discrepancy rates for the validated items of interest are presented in a table later in this section.

How to Interpret Confidence Intervals for Discrepancy Rates

What is a confidence interval?

Simply speaking, confidence intervals are a useful way to consider the margin of error, which is a statistic often used (e.g., in voter polls) to indicate the range within which a value is likely to be correct. For example, 30% of voters favor a particular candidate, with a margin of error of plus or minus 3.5%.

Why do we need to consider confidence intervals if we already know the exact discrepancy rates for each clinic?

No discrepancy rate or statistic is absolute. Suppose that, during validation, a sample of 100 cycles was reviewed, and a discrepancy rate of 15% was determined for a particular data item, with a 95% confidence interval of 10% to 20%. The 15% discrepancy rate tells us that we estimate the average chance that a discrepancy occurred for the selected data field among all reported cycles to be 15%, based on the results of our sample of 100 cycles. However, that estimated discrepancy rate may not match the true discrepancy rate that we would calculate if we were to validate every single cycle during a reporting year.

The 95% confidence interval tells us that we are 95% confident that the true discrepancy rate is 10% to 20%. In other words, if we were to repeat the process of selecting a sample of 100 cycles many times, calculating the discrepancy rate and 95% confidence interval for each sample, we would expect 95% of the calculated confidence intervals to capture the true discrepancy rate.

##### Table. Discrepancy Rates by Data Fields Selected for Validation
Data Field Name

Discrepancy Ratea
(Confidence Intervalb)

discrepancies
Patient date of birth

0.3%
(0.1, 1.0)

Cycle intention

0.5%
(0.2, 1.3)

Cycle start date

0.2%
(0.1, 0.6)

Date of egg retrieval

0.4%
(0.1, 1.1)

Number of embryos transferred

0.1%
(0.0, 0.2)

Outcome of ART treatment (pregnant or not pregnant)

0.1%
(0.0, 0.2)

Pregnancy outcome (such as miscarriage, live-birth, or stillbirth)

0.3%
(0.1, 1.2)

Date of pregnancy outcome

0.6%
(0.2, 1.9)

Number of infants born

0.2%
(0.1, 1.0)

Cycle count

0.4%
(0.2, 1.1)

Patient Diagnosis—Reason for ART
Tubal factor

0.2%
(0.1, 0.7)

Ovulatory dysfunction

0.5%
(0.2, 1.4)

Diminished ovarian reserve

0.5%
(0.2, 1.1)

Endometriosis

0.1%
(0.0, 0.2)

Uterine factor

0.2%
(0.1, 0.5)

Male factor

1.6%
(0.5, 5.1)

Male factor was underreported. For 70% of discrepancies, male factor was found in medical records, but was not reported by the clinic.
Other factor

1.6%
(0.8, 3.5)

Other factor was overreported. For 63% of discrepancies, other factor was reported by the clinic, but was not found in medical records.
Unknown factor

0.4%
(0.2, 1.2)

ART = assisted reproductive technology.

 a Discrepancy rates estimate the proportion of all ART cycles with differences for a particular data item. Discrepancy rate calculations weight the data from validated cycles to reflect the overall number of cycles performed at each clinic. Thus, findings from larger clinics were weighted more heavily than those from smaller clinics. b This table shows a range, called the 95% confidence interval, that conveys the reliability of the discrepancy rate. A general explanation of confidence intervals is provided earlier in this section.