Mapping households using a cost-effective data collection system for a multiple use platform, western Kenya

Project Name: Mapping households using a cost-effective data collection system for a multiple use platform, western Kenya

Project Status: Proposed

Point of Contact: Meghna Desai

Center: CGH

Keywords: data collection, cloud storage, Rapid Application and Server Deployment, cellular data transmission.

Project Description: The CDC field station in Kenya supports a Demographic Surveillance Platform in Siaya County in western Kenya that requires mapping of individual households, roads, schools and health facilities in a population of nearly 1m people. This is labor and transport intensive as the existing system uses PDAs which require daily charging and daily downloading of data onto laptops which meant that teams had to come to a central location in the morning and evenings. In the proposed project we will explore a new method to map households using a novel, cost-effective, and time efficient data collection system that allows mapping teams to be more independent of central support and remain in the field for 2 weeks without having to come to a central location. Basic location information will be collected by Android tablets, transmitted to a cloud based server using cell data services, and downloaded at the central office, approximately 100km away. Data collectors will remain in the field, recharging their tablets with high capacity lithium-ion battery rechargers capable of 8 to 10 recharges.

This system will be rapidly developed using the Open Data Kit system, extremely fault tolerant since the data are written to a cloud based server with multiple levels of backup, securely transmitted using SSL transmission, and economical since field transportation and IT costs are minimized.
This system will explore the use of an open source rapid application development system (ODK) that includes an automated deployment of a cloud server (ODK Aggregate) on Google’s app engine environment. The data collection hardware will use an Android Tablet with a 7 inch screen and Quad core CPU with integrated GPS and SIM cards, which will be used for data transmission. The setting of this project is Siaya County, Nyanza Province, Kenya, an area with a high burden of infectious diseases. The area is approximately 100 km from the KEMRI and CDC field station near Kisumu. The field life of the tablet battery will be more than 1 day, but typically less than 2 days. Electricity will not be commonly available in this environment. The location makes daily transport to and from the study area time consuming and expensive. Data collectors will be given high capacity lithium-ion battery rechargers that can be expected to recharge their tablet fully 8 to 10 times. Every two weeks, the data collectors will be met and supplied with a freshly charged battery recharger. They will return their spent batteries at this time. Additionally, a manual backup of the data on the tablets will be performed during the same visit. The study area has spotty cellular network coverage and is often 2G, which will test the store-and-send-as-network-is- available capability of the system.

If successful, this new paradigm for data collection will greatly reduce costs at many levels. First, transportation costs, always a huge part of any field project, will be reduced dramatically since daily trips will be replaced by biweekly (or at worst, weekly) trips. The replacement expenses are the battery chargers and the cost of cellular data transmission. The cost of the high capacity battery chargers will be approximately $100 each. Data transmission costs of the mapping information will be less than $5 per month per data collector. Next, computer hardware and computer support costs will be greatly reduced, since a server will not be purchased (which also eliminates the need for an IT technician to configure the server). The cost of the cloud server and associated storage will be less than $30 per month. At that rate, it would be over 8.5 years before the costs of the cloud system would approach that of a very low end $3,000 server. Data transmission for these systems are secure using many defaults and can be made even more secure by simply choosing a number of options. The data backup systems are much more robust than what can be supplied at the field station and the cloud servers operate much more robustly than a server (which can be a single point of failure in an environment with challenging electric current stability). We intend to use the Android Device Manager that will allow us to remotely monitor, lock, and erase the data from the tablets.

The data would also be rapidly available. Our intent to produce weekly reports that will include data collected up to the previous evening, however real-time monitoring of data and data collectors will be a part of this system. The ODK system comes with online, real-time dashboards and data visualization tools (charts, maps, etc.).

This rapid application development and cloud server deployment would seem to be scalable to a variety of CDC audiences that have a need to rapidly and securely deploy data collection operations. Importantly, the system has the ability to push new or modified data collection forms out to the individuals in the field. The cellular based data transmission allows data to be collected and transmitted in challenging, low-coverage cellular network environments. Because the data are available in the cloud, support for data analysis can be provided off site from anywhere in the world. The integrated GPS in quad-core CPUs is much faster to obtain a highly accurate GPS reading than even the dual-core CPUs of the previous year. Yet, an Android Tablet with a 7 inch screen, quad core CPU with integrated GPS, cell phone SIM slot and open source ODK software costs less ($300, Best Buy) than a mid-level Garmin GPS unit that has nowhere near this utility.

This project will be successful, if the we are able to successfully train data collectors to operate independently in the field for long periods of time and their data collection devices are able to successfully transmit their data without significant time outs or data loss occurring. We will be able to compare the manual data backups with the cloud storage data on a day by day, village by village, and data collector by data collector basis. Another measure of success will be collection of more accurate data, at less expense, and with a decreased need for IT support of varying types (server configuration, high level software development). We will be able to document each of these costs as a part of this project.

For more information about this project, please contact the CHIIC at or Brian Lee at

Page last reviewed: February 15, 2019