Dietary Supplement Imaging in the National Health and Nutrition Examination Survey
Project Status: Completed
Point of Contact: Kathryn S. Porter
Keywords: Data Collection, Imaging, Optical Character Recognition
Project Description: Half of adults in the United States take dietary supplements. Dietary supplements are widely available to U.S. consumers and monitoring their use over time is an important component of the National Nutrition Monitoring System. The National Health and Nutrition Examination Survey (NHANES) is the primary source of dietary supplement use data for the nation. Collecting this type information is complex and costly. NHANES participants are asked to show their dietary supplements containers to NHANES interviewers. Interviewers manually record the supplement name, manufacturer, and address, and collect information on duration of use, frequency of use in the past month or past 30 days, and the amount taken when used. This information is also transferred to the mobile examination center (MEC) and the phone center for the two dietary recalls. Product label information is then obtained in a separate effort by NHANES program staff. A database has been created with supplement label information collected from over 12,000 products.
The proposed project will explore the use of digital imaging technology to capture supplement container information in a set of photographic images. In addition, technology has advanced to where optical character recognition (OCR) of non-flat small documents can correctly identify 92.4% (Biondich et al., 2002; Sears, 2011) of characters. This advance allows for the supplement name, manufacturer, address, and nutrition label printed the supplement container to be captured electronically, eliminating the need for manual entry. This would greatly reduce the effort by NHANES program staff from having to obtain the label from the manufacturer, distributor or internet.
- Potential impact of project if successful:
Digital imaging with optical character recognition will replace manual data entry by NHANES interviewers, reduce data errors and reduce the time it takes to interview at the point of data collection.
- Scalability – applicability to wider audiences within CDC:
This technology could be adapted to other CDC surveys/surveillance activities that collect data from containers (supplements, prescription drug, etc.) and may be useful in other CDC settings and in outbreak investigations. If the project is successful, the application can be pilot tested in NHANES in 2015.
- Methodology – how your project will be carried out:
There are many technologies available that can be used to take pictures of product labels. Through exploration of built-in and external cameras, an application will be developed that will accurately capture the product label image, attach the product image (s) to the correct participant and store the labels for further processing by OCR technology. After the recognition step, the information from the label will be stored in a database developed by our data collection contractor that correctly saves the information into relevant fields. Multiple label images will need to be stitched together into a single one and then the OCR process be applied on the stitched image. Supplement names converted to text by OCR technology will also be transferred to the 24-hour dietary recall interviews conducted in the MEC and over the phone.
Additionally, we would like to explore software that can determine at the time of collection whether the images are legible and clear. The software would help interviewers when taking pictures to decide which pictures are accurate and clear enough to be saved.
- Measure of success – your expected outcome:
The NHANES program wants to capture interpretable digital images that can be successfully translated to text via OCR. Since other studies have determined that using OCR technology, 92.4% correct translation of images is possible, we will also aim for a similar percentage of the label being successfully translated. When taking a picture of a label on a non-flat surface the text cannot be adequately captured in a single image. Therefore, we will also assess how many images are needed to provide all information from the label.
We would also like to evaluate, a) how image quality changes with labels/bottles that are very small, b) how image quality changes with differing capabilities (i.e. different interviewer capabilities), and c) how easily images can be transferred to the MEC and phone center for review during the dietary interview.