7.4 General Good Data Strategies and Practices
Some simple practices in data definition and collection can prevent errors and improve the consistency and quality of surveillance. Crucially, these practices should be developed at the planning stage, before data are collected and should be incorporated into staff training.
- Explicitly define all data elements. Each data element (even apparently obvious ones such as sex, birth weight, residence) should be defined explicitly, including where they will be found and how they will be coded.
- Explicitly give instructions for challenging data elements. Real life is messy, so give instructions for data that are not in the chart (unknown data) or fields left blank on data forms (missing data). The goal is consistency (everybody sharing the same approach to similar issues) and efficiency (no guessing needed).
- Store raw data rather than calculated variables. For many key elements, raw data are best because they are easily preserved and data granularity can be preserved. For example, instead of recording body mass index, a programme should collect height and weight data. Body mass index can be calculated from height and weight, but not the converse. For some complex data, such as an echocardiogram, this is not feasible, so a report will suffice. However, a programme should strongly consider storing photographs or copies of radiographs to assist with centralized case review; for instance, in cases of limb deficiencies, complex phenotypes, or potential syndromes.
- Do not categorize continuous variables. Examples include birth weight, height, weight, gestational age and maternal age. Although in many cases these data will eventually be coded during data analysis, collecting the actual value is much more valuable in the long run (e.g. providing flexibility for new analyses), does not take additional work, and allows for better error checks.
- If using categorical variables, code at data collection. In some cases, coding at data collection is reasonable and might save time. Examples include gender (e.g. 1 = male, 2 = female, etc.) or race/ethnicity (if a fairly comprehensive list can be generated, with an option for“other – specify”). It is good practice to be consistent with coding, using similar codes for similar questions (e.g. 1 = no, 2 = yes, etc.). If using a data collection form (paper-based or electronic), it is good practice to show the code with the label (e.g. 1 = no, 2 = yes, etc.).
- Minimize open-ended, free text in data entry fields. Free text is difficult to analyse and requires expert review to transform into analysable data. However, the recommendation is to minimize free text – not to avoid it completely – because at times it is necessary and in fact critical to preserve the information content. Examples include verbatim description of the birth defect or phenotype, and comment sections of certain areas on the abstraction forms. Birth defects are complex conditions, and the data abstractor must have an opportunity to describe the complexity or uncertainty so that experts at the central/coordination level can review and resolve appropriately.
- Include a thoughtful set of potential confounders. In addition to basic descriptive data (e.g. demographics, birth defect description, etc.), consider including information that can be used to adjust, stratify and compare across groups, such as maternal age, smoking, prenatal care, etc. To make a rational and efficient choice (every piece of data has a cost), review the goals of the programme and the information that one wishes to analyse and report to ensure that these additional data can be collected, that the quality is “fit for use”, and that the data will be used.
- Develop standard operating procedure (SOP) manual for a programme. All the processes of a surveillance programme should be incorporated into an SOP manual, including all issues discussed in this section. This manual should be clear and up to date. Developing and maintaining such a manual is a significant commitment, but if used well, it is crucially important for several reasons:
- First, it forces planners to map out in detail the programme’s processes, and helps identify roadblocks and potential solutions.
- Second, such a manual is a key training resource that can provide clear instructions for new staff as well as for refresher training, thus ensuring consistent processes in a programme.
- Third, a SOP manual helps hold staff accountable since both the processes and the responsibilities are clearly described.
- Fourth, a detailed SOP manual provides transparency and increases trust at all levels of the programme and among stakeholders.
- Finally, and more broadly, a SOP manual becomes an aid in creating a visual/written and detailed process map (to be discussed later), as well as setting explicit standards for data, guidelines for processes and ongoing evaluation.
- Nurture teams with training, feedback, communication and recognition. The human element is critical in any public health surveillance process. The best programmes use teams to meet their goals, and the teams include representatives from all levels and processes of the programme, from front-line staff (e.g. nurse abstractors) to data managers to analysts to administrators. This approach fosters communication; shared understanding of goals, processes and issues; and provides a valuable resource to identify solutions to data quality problems. Front-line staff are especially critical, as they are responsible for collection of the primary data. Collecting high-quality primary data is crucial and will be the focus of a later section.