3.13 Data Collection and Management
Accurate data collection and management, including storage and analysis, are key components of any programme conducting congenital anomalies surveillance, and different instruments and methods of data gathering can be used for this purpose. Well-designed data systems improve data management, permit statistical analyses and data sharing among different surveillance programmes, and support linking of congenital anomalies data with other available information for surveillance, research and prevention purposes.
It is important that the collection and analysis of data for the surveillance of congenital anomalies is done in a systematic way by trained surveillance personnel. It is also important that data are accurate and of high quality before analysis is performed. If done well, data analysis will provide accurate, timely and complete information on the occurrence of congenital anomalies.
There are three main attributes to data quality: completeness, accuracy and timeliness.
- Completeness refers to the extent to which data are all-inclusive and comprehensive. For example, all cases at a given source in a specific time frame have been identified, and all required data have been abstracted. Hospital audits and linkage of cases to vital records or to specialized diagnostic centres can help evaluate the completeness of case ascertainment.
- Accuracy refers to the extent to which data are exact, correct and valid. Approaches to help ensure data accuracy include: re-abstraction of information, validity audits (e.g. identification of missed diagnoses or coding issues), clinical reviews (e.g. verification of codes, tests and procedures), and verification of data entry (e.g. customized programmes for range checks, automated fields, rejection of data that are known to be inaccurate, routinely running data queries to identify duplicate entries, and identifying problems with variables).
- Timeliness refers to the extent to which data are collected and analysed in a timely manner. It is measured by time that elapses between the date of diagnosis and date of abstraction; the date of abstraction and the date information is sent to the office; and the date of arrival in the office to the date entered in the system.
Data-collection procedures are carried out properly and systematically. Protocols usually include reviews of the information in the data sources, to verify that data are being recorded in a standardized way. Also, if feasible, having a process whereby a sample of the medical records can be reviewed will ensure that information in the abstraction forms reflects the information on the medical record.
Poor-quality data can lead to erroneous conclusions about the occurrence of a congenital anomaly among a population and could have a substantial effect on the decision-making process of public health authorities.
The following are examples of factors that could affect data quality:
- missing values (e.g. empty data fields in the abstraction form);
- duplicate entry of cases;
- errors in the diagnosis, description or coding of congenital anomalies; and
- bias related to lack or excess of representation, or if data include only very severe cases:
- if data include only cases from urban settings,
- if data include only private sector data sources, and
- if data include cases from outside of the catchment area.
Programmes interested in more detailed information on data management can find suggestions in the guidelines developed by the NBDPN in the USA (14).
In surveillance of congenital anomalies, the term “incidence” is not commonly used to describe their occurrence. “Prevalence” refers to all new cases of congenital anomalies. Because spontaneous abortions cannot be counted accurately, the suggested measure of occurrence of congenital anomalies is “live birth prevalence”, “birth prevalence” or “total prevalence”.
In a population-based surveillance programme, the prevalence of congenital anomalies is calculated by aggregating the number of unduplicated existing cases (i.e. live births and fetal deaths or terminations) as the numerator, and the total number of live births among the source population as the denominator, for a specific catchment area and time period. For hospital-based surveillance, the prevalence of congenital anomalies is calculated by aggregating the number of unduplicated hospital cases as the numerator, and the total number of hospital live births as the denominator for a specific hospital. Hospital-based prevalence can include one or more hospitals.
Note: it is important to remember that hospital-based prevalence estimates can be biased, in that they give the prevalence of a condition only for the participating hospital. Prevalence estimates based on hospital data are not true estimates of the prevalence of a condition among a population.
When measuring the prevalence of congenital anomalies, it is important to note what is being counted in the numerator and in the denominator.
Usually, the prevalence of congenital anomalies is calculated and presented as prevalence per 10 000 live births. This prevalence can be calculated for all congenital anomalies, for a specific individual anomaly, or for groups of anomalies. The following expression is used to calculate the birth prevalence of congenital anomalies, with the assumption that both live births and fetal deaths are being captured:
Birth prevalence = a/b × 10 000
- a: Number of live births and fetal deaths (stillbirths) with a specific congenital anomaly (e.g. spina bifida) counted among the source population in a given year.
- b: Number of live births and fetal deaths (stillbirths) (during the same year).
Birth Defects Prevalence Calculation
The numerator includes live births and known fetal deaths (stillbirths) with congenital anomalies, and pregnancy terminations with congenital anomalies (if these data are available), or all. The denominator comprises only live births and fetal deaths (stillbirths) (if these data are available), because it is practically impossible to assess the total number of pregnancy losses. Because the number of pregnancy losses is relatively small, compared with the number of live births, its exclusion has little effect on the prevalence estimate. Spontaneous abortions (also called miscarriages) are not included in the numerator or in the denominator because it is practically impossible to assess the total number of spontaneous abortions.
Case counts and crude prevalence are common measures of burden that are often presented with respect to time, geographic area, demographic characteristics, or various combinations (e.g. age-by-race-by-sex). When variations in prevalence are identified, they are described and analysed. Many factors could affect the prevalence of a health event: population changes due to migration, improved diagnostic procedures, enhanced reporting techniques, and changes in the surveillance system or methods. It is important to consider these factors when interpreting the results.
Description of changes over time is an important way of detecting trends. A comparison of the number of case reports collected during a particular time period may help identify differences in the number of cases for a current time period compared with time periods in previous years. These differences can help to determine seasonal patterns. The number of cases can vary by geographic location, and analysis by place can help identify where an increase in cases is occurring. In the case of rare congenital anomalies, the size of the geographic unit to be considered is important in order to provide stable estimates. The analysis of demographic characteristics provides information on the characteristics of those individuals with particular congenital anomalies. The most frequently used demographic variables for analysis are age, sex, and race and ethnicity.
Table 3.4 presents an example of calculating the prevalence of congenital anomalies that highlights the importance of knowing the denominator. Knowing only the number of cases (numerator data), without having information about the denominator can result in a misinterpretation of the true burden of a congenital anomaly.
Table 3.4. Example of calculating prevalence and the importance of the denominator
total number of cases of
|Denominator||Prevalence||Cases per 10 000 live births|
|100||100 000 (total live
births per year
in region or total
|0.001||10 per 10 000|
|100||10 000 (total live
births per year in
of the total
|0.01||100 per 10 000|
|100||1000 (total live
births per year in
one referral hospital
of the total
|0.1||1000 per 10 000|
A country decides to start a congenital anomalies surveillance programme in one region where the total number of live births per year is estimated to be 100 000. The surveillance programme will be population based and will include all fetuses or neonates identified with congenital anomalies in the region. After one year, the programme identifies 100 fetuses or neonates with congenital anomalies. The prevalence of congenital anomalies for that region will be 0.001 (10 cases per 10 000 live births).
A country decides to start a congenital anomalies surveillance programme in all maternity hospitals in one region, and eight hospitals will participate. Only fetuses or neonates with congenital anomalies born in one of the eight participating hospitals will be counted. The total number of births per year in the eight hospitals is estimated to be 10 000. After one year, the programme identifies 100 fetuses or neonates with congenital anomalies. The prevalence of congenital anomalies for those hospitals will be 0.01 (100 cases per 10 000 live births).
A country decides to start a congenital anomalies surveillance programme in a referral hospital in one region. This hospital is where prenatally identified fetuses with congenital anomalies are usually referred for delivery. The hospital typically has 1000 births per year. After one year, the hospital identifies 100 fetuses or neonates with congenital anomalies. The prevalence of congenital anomalies for that particular hospital is 0.1 (1000 cases per 10000 live births).
Without knowing the denominator for each example, the prevalence estimate could be misinterpreted. The prevalence estimate for Example C might indicate that this country has a high prevalence of congenital anomalies, when in reality the estimate resulted from a small denominator and the site is a referral hospital. The prevalence estimates for examples B and C represent the prevalence for eight hospitals and one referral hospital, respectively. These would not be considered true prevalence estimates. The prevalence estimate for example A is based on the total number of live births for a population and thus it yields the most accurate prevalence estimate.