Volume 10 — June 06, 2013
TOOLS AND TECHNIQUES
Using Geographic Information Systems to Compare Municipal, County, and Commercial Parks Data
Kelly R. Evenson, PhD, MS; Fang Wen, MS, MCS
Suggested citation for this article: Evenson KR, Wen F. Using Geographic Information Systems to Compare Municipal, County, and Commercial Parks Data. [Erratum appears in Prev Chronic Dis 2014;11. http://www.cdc.gov/pcd/issues/2014/12_0265e.htm.] Prev Chronic Dis 2013;10:120265. DOI: http://dx.doi.org/10.5888/pcd10.120265.
Parks are an integral part of a favorable built environment, and several studies have found a positive association between a favorable built environment and physical activity. Parks data are available to researchers from various sources; however, the accuracy of data sources in representing parks is unknown. This study compared secondary parks data obtained from a commercial vendor with data from municipal/county government records, all of which were verified by using Internet searches, telephone inquiries, or on-the-ground audits.
We studied large metropolitan areas in 3 states: North Carolina (1,837 sq mi), Maryland (1,351 sq mi), and New York (260 sq mi). We collected information on park land area (shapefiles) from municipal/county governments from 2009 through 2012 and from a commercial source in 2010.
Commercial parks data did not include 30.5% (116/380, 20.2 sq mi) of North Carolina, 42.9% (187/436, 21.3 sq mi) of Maryland, and 71.7% (629/881, 12.7 sq mi) of New York parks that we found and verified from municipal/county sources. Municipal/county data did not include 15.8% (60/380, 10.1 sq mi) of North Carolina parks, 27.5% (120/436, 74.8) of Maryland parks, and 9.1% (80/881, 7.8 sq mi) of New York parks that we found and verified from commercial sources.
In this study, the combination of commercial and municipal/county data sources that were verified provided the most complete and accurate shapefile. The quality of secondary sources of parks data should be checked prior to use and, if needed, methods incorporated to improve the capture of parks.
Numerous studies have found a positive association between a favorable built environment and physical activity, such as walking or bicycling (1–4). Parks are an integral part of the built environment. They exist in many communities and often provide free places for physical activity (5). Researchers and public health practitioners have studied access to parks to help plan where new parks should be developed, to identify underserved locations, and to determine what facilities should be offered at the parks (6–9). The development of geographic information systems (GIS) has facilitated the study of spatial access and use of parks.
Researchers using GIS to study parks can obtain parks data from several sources. Sources include using commercial sources (9,10), assembling park locations from local jurisdictions, such as municipal or county governments (7,11), and on-the-ground audits that include measuring park boundaries in the field (12–14). Each of these sources varies in cost and time required. To our knowledge, no study has compared the accuracy of commercial and municipal/county data sources in representing park geographic area and amenities. Our study compared parks data obtained from commercial sources with those from municipal and county government sources for 3 large metropolitan areas in 3 US states: North Carolina, Maryland, and New York. The findings highlight strengths and limitations of both data sources. We also explored the effect of parks being omitted from both data sources.
We defined a park as a public place set aside for physical activity and enjoyment. This definition did not include cemeteries, mobile home parks, historic sites, professional stadiums, country clubs, zoos, private parks, private facilities (such as stand-alone baseball or tennis facilities), or stand-alone recreation centers.
The study areas corresponded to 3 of 6 US locations from the Multi-Ethnic Study of Atherosclerosis (MESA), a cardiovascular cohort study that enrolled 6,814 participants from 2000 through 2002 (15). The 3 study areas, as defined by the MESA Study, were expanded to capture areas where participants had moved since enrollment. In this paper, we refer to the study areas by state: North Carolina (Davidson, Davie, Guilford, Forsyth, Randolph, Rockingham, Stokes, Surry, and Yadkin counties [1,837 sq mi]); Maryland (79 zip code areas in Anne Arundel, Carroll, Harford, Howard, and Baltimore counties and Baltimore city [1,351 sq mi]); and New York (183 zip code areas in Bronx, Brooklyn, Manhattan, and Queens boroughs, and Westchester County [260 sq mi]).
From 2009 through 2012, we used municipal or county GIS shapefiles (GIS files that include the park name and an outline of each park drawn as a polygon) to locate parks, most of which came from planning, parks, and recreation departments. In a few instances, we used Google maps (http://maps.google.com/maps) to draw a park boundary when no other outline of the park was available. If only part of the polygon for a confirmed park was in the study area, we included it in our study. Parks with multiple polygons but the same name were manually merged and designated as 1 park. Parks were verified by using Internet searches, telephone inquiries, and if necessary on-the-ground audits.
To determine the amenities available at each park (eg, tennis courts, basketball hoops, swimming pools), we searched online, contacted municipal/county departments, or visited the park. This process also allowed us to verify that the park conformed to our park definition.
We obtained commercial data on parks for 2010 from Esri (Esri, Redlands, California). Esri metadata (a summary document containing information on the data set) indicated that parks and forests were identified at the national, state, and local levels, including county and regional parks, and referenced Tele Atlas MultiNet North America (Lebanon, New Hampshire; www.teleatlas.com). We verified the existence of parks and park facilities that Esri identified by using the same methods we used to verify municipal/county sources, primarily through Internet searches and telephone inquiries.
We used several tools in ArcGIS 10.0 (Esri, Redlands, California) to compare the park shapefiles obtained from the commercial sources with files obtained from municipal/county sources. For each of the 3 states, GIS files from both data sources were assembled and overlaid using the state plane coordinate system. Parks that partially overlapped were explored manually in ArcGIS by comparing the park name, shape, and percentage of the area overlapping to determine whether the parks were the same.
The area of each park polygon was calculated for both data sources by using the ArcGIS calculating geometry tool. With the 2 shapefiles projected on top of each other in ArcGIS, the concordant park area from the 2 data sources was extracted, corresponding to spatially matched areas. This area in square miles was calculated for both matched and mismatched park areas.
To quantify the impact of missed parks (defined as parks reported in one data source but not the other), we calculated an indicator described in the Centers for Disease Control and Prevention’s (CDC’s) recommended strategies to enhance or create access to places for physical activity (16). The indicator for the extent of the public’s access to parks was defined as “the percentage of US census blocks with parks.” The indicator was calculated as the proportion of 2010 census blocks that have at least 1 park within the block or within 0.5 miles of the block boundary. This metric was calculated for both data sources separately and for both combined. As a second metric to quantify the effect of missed parks, the percentage of parks with each type of facility missed (eg, basketball court, swimming pool) was calculated for both data sources.
Overall, we verified the existence of 380 parks in the NC study area, 436 parks in the Maryland study area, and 881 parks in the New York study area (Table 1). The commercial data source did not include the following percentage of parks found and verified in municipal/county sources: 30.5% (116/380, 20.2 sq mi) in North Carolina, 42.9% (187/436, 21.3 sq mi) in Maryland, and 71.7% (629/881, 12.7 sq mi) in New York. The municipal/county data sources did not include the following parks found and verified in the commercial source: 15.8% (60/380, 10.1 sq mi) in North Carolina, 27.5% (120/436, 74.8 sq mi) in Maryland, and 9.1% (80/881, 7.8 sq mi) in New York. Municipal/county data sources showed higher percentages of land area with parks for North Carolina and New York than did the commercial data sources but a lower percentage for Maryland.
To examine the effect of parks missing from either data source, we explored how the CDC indicator of at least 1 park within a census block or 0.5 miles from the block boundary varied with commercial data and municipal/county data. On the basis of verified (ie, via Internet searches, telephone inquiries, and audits) and combined data sources (ie, parks identified from either or both commercial and municipal/county source), the proportion of census blocks with park access was 35.2% in North Carolina, 64.1% in Maryland, and 97.9% in New York (Table 2). Verified combined parks from municipal/county data sources were more accurate than estimates from commercial sources for North Carolina (absolute proportion difference, 1.1% municipal/county vs 6.2% commercial) and New York (absolute proportion difference, 0.7% municipal/county vs 28.6% commercial), but less accurate for Maryland (absolute proportion difference, 7.6% municipal/county vs 5.2% commercial).
To examine the impact of missing parks in either data source, we also quantified the facilities missed if relying only on 1 data source (Table 3). For example, if relying only on municipal/county park data, the data would be missing 12 parks with baseball or softball fields in NC, 30 parks in MD, and 14 in NY. If relying only on the commercial park data, the data file would be missing 34 parks with baseball or softball fields in NC, 72 in MD, and 105 in NY.
When comparing parks data obtained from commercial and municipal/county sources, we found that both data sources omitted parks whose existence was verified through Internet searches, telephone inquiries, or on-the-ground audits. The most accurate park assessment was derived by combining verified commercial and municipal/county data together.
There are several advantages and disadvantages to both commercial data and municipal/county park data for research purposes. Although it may be necessary to purchase commercial data, such data may be easier to use and require less staff time. A disadvantage to commercial data sources is that they may include spaces that are not considered parks by the researchers’ definition.
Municipal/county parks data files were generally more complete than commercial data sources; however, acquiring them required significant staff time. The quality of municipal/county GIS data varied across geographic areas, and it was unclear how frequently data files were updated. Therefore, they may be temporally mismatched across multiple administrative boundaries. Users should be aware that national parks, state parks, and forest areas may not be included in municipal/county parks data.
Neither municipal/county or commercial sources of parks data provided information on facilities in the park or the quality of parks. Facilities offered at the park can be identified, as in our study, through Internet searches, telephone calls, and site visits if needed; data on park quality can be collected through site visits or, as in New York City, through its park inspection program (8). Neither the municipal/county or commercial data sources included private neighborhood parks that may be accessible to the public. Whether these parks are of interest can be determined through an audit or site visit, although private neighborhood parks without road access may still be missed. Audits may miss parks that are unnamed (ie, lack signage), and conducting audits may require significant time and cost (14). Although park shapefiles in commercial data sources are static, we learned that they are fluid in municipal/county sources. By “fluid” we mean that parks may be added, removed, or renamed and that facilities within parks can change over time. Park shapefiles and inventory of amenities should be updated if a study spans an extended period.
Impact of the park data source
To explore the impact of the 2 park data sources, we used a CDC indicator: the percentage of census blocks that had parks within their block or within 0.5 miles of the boundary (16) (Table 2). We compared our results with CDC’s finding of a 20% median across the United States of access to parks, ranging from 2% (Mississippi) to 47% (California). For its calculations, CDC used national, state, county, and local parks data from a 2007 commercial source. We calculated the indicator by using municipal/county and commercial parks data and found that the result varied between the 2 data sources and across locations. When compared with the combined and verified park data, the absolute prevalence difference ranged from 0.7% to 7.6% for municipal/county data and 5.2% to 28.6% for commercial data. The differences were most remarkable for the commercial data for New York, because a large number of parks were missing. For North Carolina and New York, municipal/county data provided estimates closer to the combined and verified data than did data from the commercial source. However, for Maryland the commercial data provided estimates closer to the combined and verified data because of the larger spatial area of parks that were in the commercial data but not in the municipal/county data.
We also calculated the effect of parks missing from both data sources by quantifying the facilities at each park that were missed (Table 3). We found that parks that were missed did contain a variety of facilities, a finding that had a larger impact on most commonly found active park facilities, such as baseball or softball fields and basketball hoops.
Our study had several limitations. First, we did not compare results from the 2 data sources used here (ie, park data from the commercial source and data from municipal/county sources) with other commercial data sources that may be available. Second, we were unable to compare results by urbanicity and recognize that the quality of parks data for urban and rural areas may differ. Third, in some instances, the park shapefiles from the 2 data sources did not exactly match. In these situations, we determined if parks from the 2 sources were the same parks or different parks by comparing the park name, shape, and percentage of the area overlapping from visual inspection, by comparing names to see if they matched, and by the percentage of park area that matched.
This method was not subjective, because we did not go to the parks to see the differences. Fourth, the metadata from both sources could have provided more information on the geospatial data, such as the content, quality, positional accuracy, coverage, scale, and date of collection, but was not provided (17).
GIS-derived measures of parks allow practitioners and researchers to investigate park accessibility and associations of parks with physical activity by nearby residents. Studies of park accessibility and associations with physical activity would benefit from quantification of the degree of error in GIS data and ultimately the potential bias that such error introduces to surveillance measures and to environment–health associations (18). In assessing both commercial and municipal/county data sources, we found count errors (neither source listed all parks), attribute errors (commercial sources listed some parks that were not verified as such), and positional errors (parks listed in the 2 data sources did not always align). Using both data sources and verifying that parks existed was the most accurate way to develop the park shapefile in this study. However, it is still possible that parks were missed even though we used both sources.
These findings indicate that practitioners and researchers should check park shapefiles from commercial or municipal/county sources before using them by verifying them against other sources of information. A comprehensive parks file for the entire United States, developed using standardized GIS protocols (17,19), could facilitate parks-related research. With more than 9,000 local parks and recreation departments and organizations that manage more than 108,000 public park facilities and 65,000 indoor facilities (20), the coordination of data across jurisdictions is complex. A database to house this information that is regularly updated could be useful to future research and for surveillance purposes.
This study was funded by the National Heart, Lung, and Blood Institute, National Institutes of Health (NIH) (No. 2R01 HL071759). Funding was also provided the Robert Wood Johnson Foundation (RWJF) through its national program, Active Living Research (No. 52319). The grants were managed through the University of North Carolina’s, Center for Health Promotion and Disease Prevention (cooperative agreement No. U48-DP000059), a member of CDC’s Prevention Research Centers Program. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH, RWJF, or CDC.
Corresponding Author: Kelly R. Evenson, Department of Epidemiology, Gillings School of Global Public Health, Department of Epidemiology, and Center for Health Promotion and Disease Prevention, University of North Carolina, Chapel Hill, 137 East Franklin St, Ste 306, Chapel Hill, NC 27514. Telephone: 919-966-4187. E-mail: email@example.com.
Author Affiliations: Fang Wen, Gillings School of Global Public Health, Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina.
- Durand CP, Andalib M, Dunton GF, Wolch J, Pentz MA. A systematic review of built environment factors related to physical activity and obesity risk: implications for smart growth urban planning. Obes Rev 2011;12(5):e173–82. CrossRef PubMed
- Giles-Corti B, Kelty SF, Zubrick SR, Villanueva KP. Encouraging walking for transport and physical activity in children and adolescents: how important is the built environment? Sports Med 2009;39(12):995–1009. CrossRef PubMed
- Davison KK, Lawson CT. Do attributes in the physical environment influence children's physical activity? A review of the literature. Int J Behav Nutr Phys Act 2006;3(19):1–17. PubMed
- Humpel N, Owen N, Leslie E. Environmental factors associated with adults' participation in physical activity: a review. Am J Prev Med 2002;22(3):188–99. CrossRef PubMed
- Kaczynski AT, Henderson KA. Parks and recreation settings and active living: a review of associations with physical activity function and intensity. J Phys Act Health 2008;5(4):619–32. PubMed
- Talen E. The social equality of urban service distribution: an exploration of park access in Pueblo, Colorado and Macon, Georgia. Urban Geogr 1997;18(6):521–41. CrossRef
- Diez Roux AV, Evenson K, McGinn A, Brown D, Moore L, Brines S, et al. Availability of recreational resources and physical activity in a sample of adults. Am J Public Health 2007;97(3):493–9. CrossRef PubMed
- Weiss CC, Purciel M, Bader M, Quinn JW, Lovasi G, Neckerman KM, et al. Reconsidering access: park facilities and neighborhood disamenities in New York City. J Urban Health 2011;88(2):297–310. CrossRef PubMed
- Zhang X, Lu H, Holt JB. Modeling spatial accessibility to parks: a national study. Int J Health Geogr 2011;10:31. CrossRef PubMed
- Wen M, Kowaleski-Jones L. The built environment and risk of obesity in the United States: racial-ethnic disparities. Health Place 2012;18(6):1314–22. CrossRef PubMed
- Giles-Corti B, Broomhall M, Knuiman M, Collins C, Douglas K, Ng K, et al. Increasing walking: how important is distance to, attractiveness, and size of public open space? Am J Prev Med 2005;28(2, Suppl 2):169–76. CrossRef PubMed
- Evenson KR, Sotres-Alvarez D, Herring AH, Messer L, Laraia BA, Rodriguez DA. Assessing urban and rural neighborhood characteristics using audit and GIS data: derivation and reliability of constructs. Int J Behav Nutr Phys Act 2009;6:44: CrossRef PubMed
- Day K, Boarnet M, Alfonzo M, Forsyth A. The Irvine-Minnesota inventory to measure built environments: development. Am J Prev Med 2006;30(2):144–52. CrossRef PubMed
- Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF. Measuring the built environment for physical activity: state of the science. Am J Prev Med 2009;36(4 Suppl):S99–123. CrossRef PubMed
- Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, et al. Multi-ethnic Study of Atherosclerosis: objectives and design. Am J Epidemiol 2002;156(9):871–81. CrossRef PubMed
- Centers for Disease Control and Prevention. State indicator report on physical activity, 2010 national action guide. http://www.cdc.gov/physicalactivity/downloads/PA_State_Indicator_Report_2010.pdf. Accessed December 18, 2012.
- Matthews SA, Moudon AV, Daniel M. Work group II: Using Geographic Information Systems for enhancing research relevant to policy on diet, physical activity, and weight. Am J Prev Med 2009;36(4, Suppl):S171–6. CrossRef PubMed
- Boone JE, Gordon-Larsen P, Stewart JD, Popkin BM. Validation of a GIS facilities database: quantification and implications of error. Ann Epidemiol 2008;18(5):371–7. CrossRef PubMed
- Story M, Giles-Corti B, Yaroch AL, Cummins S, Frank LD, Huang TT, et al. Work group IV: Future directions for measures of the food and physical activity environments. Am J Prev Med 2009;36(4, Suppl):S182–8. CrossRef PubMed
- Godbey G, Mowen A. The benefits of physical activity provided by park and recreation services: the scientific evidence. http://www.nrpa.org/uploadedFiles/nrpa.org/Publications_and_Research/Research/Papers/Godbey-Mowen-Summary.pdf. Accessed December 18, 2012.
Table 1. Comparison of Parks Data Obtained From Municipal/County Sources with Data Obtained from Commercial Sources in 3 Locations: North Carolina, Maryland, and New York, 2009–2012
|Park Details||North Carolinaa, n = 380b||Marylandc, n = 436b||New Yorkd, n = 881b|
|Municipal/ Countye||Commercialf||Municipal/ Countye||Commercialf||Municipal/ Countye||Commercialf|
|Number of parks|
|Number of parks, total||320||261||316||246||801||251|
|Parks in both data sourcesg||204||201||129||126||172||171|
|Parks in municipal/county data but not in commercial data||116||NA||187||NA||629||NA|
|Parks in commercial data but not in municipal/county datah||NA||60||NA||120||NA||80|
|Park area (sq mi)|
|Park area, total||42.7||32.5||67.2||120.7||29.5||24.7|
|Park area spatially overlaidi||22.5||22.5||45.9||45.9||16.9||16.9|
|Park area in municipal/county data but not in commercial data||20.2||NA||21.3||NA||12.7||NA|
|Park area in commercial data but not in municipal/county data||NA||10.1||NA||74.8||NA||7.8|
|Percentage of study area in parks||2.3||1.8||5.0||8.9||11.4||9.5|
Table 2. Parks per Census Block, by Data Source and Study Area: North Carolina, Maryland, and New York, 2009–2012
|Study Area||No. of Census Blocks in Area||Municipal/Countya, n (%)||Commercialb, n (%)||Combined, nc (%)|
|North Carolina||37,492||12,798 (34.1)||10,855 (29.0)||13,214 (35.2)|
|Maryland||38,356||21,685 (56.5)||22,610 (58.9)||24,598 (64.1)|
|New York||32,819||31,910 (97.2)||22,741 (69.3)||32,134 (97.9)|
|Parks with Each Facility||Park Facilities Missed if Relying on Municipal/County Data Onlyb||Park Facilities Missed if Relying on Commercial Data Onlyb|
|North Carolina (n = 32) na (%)||Maryland (n = 65) na (%)||New York (n = 49) na (%)||North Carolina (n = 116) na (%)||Maryland (n = 187) na (%)||New York (n = 629) na (%)|
|Baseball or softball fields||12 (37.5)||30 (46.2)||13 (26.0)||34 (29.3)||72 (38.9)||104 (16.5)|
|Basketball hoops||5 (15.6)||26 (40.0)||17 (34.0)||25 (21.6)||84 (44.9)||383 (60.9)|
|Bocce ball courts||0||0||1 (2.0)||0||1 (0.5)||14 (2.2)|
|Cricket fields||0||0||0||0||0||2 (0.3)|
|General purpose fields||2 (6.3)||23 (35.4)||5 (10.0)||7 (6.0)||60 (32.1)||19 (3.0)|
|Golf holes||1 (3.1)||0||1 (2.0)||7 (6.0)||3 (1.6)||4 (0.6)|
|Football fields||0||1 (1.5)||1 (2.0)||3 (2.6)||0||14 (2.2)|
|Skate park||0||1 (1.5)||1 (2.0)||4 (3.4)||2 (1.1)||5 (0.8)|
|Soccer fields||1 (3.1)||0 (0.0)||0||11 (9.5)||2 (1.1)||13 (2.1)|
|Swimming pools||3 (9.4)||2 (3.1)||2 (4.0)||7 (6.0)||9 (4.8)||23 (3.7)|
|Tennis courts||7 (21.9)||14 (21.5)||4 (8.0)||17 (14.7)||32 (17.1)||33 (5.2)|
|Tracks||0||0||1 (2.0)||0||0||12 (1.9)|
|Volleyball courts||9.4||4 (6.2)||0||9 (7.8)||9 (4.8)||32 (5.1)|
|Outdoors or indoors|
|Racquetball, handball, or squash courts||1 (3.1)||0||5 (10.0)||0||1 (0.5)||387 (61.5)|
|General purpose fields||0||0||1 (2.0)||0||2 (1.1)||0|
|Swimming pools||1 (3.1)||0||0||0||0||7 (1.1)|
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.