Characterizing Adults Receiving Primary Medical Care in New York City: Implications for Using Electronic Health Records for Chronic Disease Surveillance

Introduction Electronic health records (EHRs) from primary care providers can be used for chronic disease surveillance; however, EHR-based prevalence estimates may be biased toward people who seek care. This study sought to describe the characteristics of an in-care population and compare them with those of a not-in-care population to inform interpretation of EHR data. Methods We used data from the 2013–2014 New York City Health and Nutrition Examination Survey (NYC HANES), considered the gold standard for estimating disease prevalence, and the 2013 Community Health Survey, and classified participants as in care or not in care, on the basis of their report of seeing a health care provider in the previous year. We used χ2 tests to compare the distribution of demographic characteristics, health care coverage and access, and chronic conditions between the 2 populations. Results According to the Community Health Survey, approximately 4.1 million (71.7%) adults aged 20 or older had seen a health care provider in the previous year; according to NYC HANES, approximately 4.7 million (75.1%) had. In both surveys, the in-care population was more likely to be older, female, non-Hispanic, and insured compared with the not-in-care population. The in-care population from the NYC HANES also had a higher prevalence of diabetes (16.7% vs 6.9%; P < .001), hypercholesterolemia (35.7% vs 22.3%; P < .001), and hypertension (35.5% vs 26.4%; P < .001) than the not-in-care population. Conclusion Systematic differences between in-care and not-in-care populations warrant caution in using primary care data to generalize to the population at large. Future efforts to use primary care data for chronic disease surveillance need to consider the intended purpose of data collected in these systems as well as the characteristics of the population using primary care.


Introduction
Widespread adoption of electronic health records (EHRs) in primary care practices has begun to transform the practice of medicine, with implications for patients and clinicians about the quality, continuity, and efficiency of care. Aside from their clinical utility, the richness of data in EHRs offers an opportunity to advance chronic disease surveillance through aggregating data (1). A major advantage of EHRs for this use over other data sources is that they can provide real-time data and clinically measured outcomes, which can complement data collected from traditional chronic disease surveillance methods, such as registries, surveys, and hospital discharge and medical claims databases (1). In the United States, national EHR incentive programs have catalyzed the transition from paper to electronic records and have led to a substantial volume of clinical data for public health research (2,3).
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.
By 2014, 83% of office-based primary care practices in the United States had adopted an EHR (4).
Although the uptake of EHRs in primary care practices presents a unique opportunity to leverage EHR data for chronic disease surveillance, the generalizability of these data for estimation of disease prevalence is of concern because some groups may be more likely or less likely than other groups to seek primary care. Predictors of primary care use tend to be female sex, higher educational attainment, older age, lower self-rated health status, increasing number of health problems, urban residence, US birth, and longer length of residence in new country if foreign born (5-9). Populations that may be underrepresented in EHR data might include healthy people who do not perceive a need for preventive care or people who are unable to access care (eg, the uninsured, undocumented immigrants) (10). The nonrandom missing data from the not-in-care population may bias estimates of disease prevalence and risk factors (11). Some EHR-based surveillance studies avoid this bias by generalizing their EHR-based surveillance data to the population in care (12,13), but others do not (14)(15)(16)(17)(18)(19).
The objective of this study was to quantify hypothetical differences in the demographics and prevalence of risk factors and chronic diseases between in-care and not-in-care populations by using data from 2 population-based surveys of New York City residents. This study will help jurisdictions, including our own, determine whether to generalize their EHR-based prevalence estimates to the general public, which includes both in-care and not-incare populations, or to generalize to the in-care population only.

Sample
We used data from the 2013 New York City Community Health Survey (CHS) and the 2013-2014 New York City Health and Nutrition Examination Survey (NYC HANES). The CHS is an annual telephone survey conducted by the New York City Department of Health and Mental Hygiene (DOHMH), modeled after the Behavioral Risk Factor Surveillance System, and targeted to noninstitutionalized adults (aged ≥18 y) with a cellular telephone or landline living in New York City (20). NYC HANES is a community-based examination survey modeled after the National Health and Nutrition Examination Survey, first conducted by DOHMH in 2004 and conducted again in 2013-2014 by the City University of New York School of Public Health and DOHMH jointly (21). Participants in NYC HANES were randomly selected noninstitutionalized adults (aged ≥20 y). For the CHS, data were restricted to participants aged 20 years or older with complete data on sex, age group, and neighborhood poverty; 576 respondents (6.5%) were excluded from the original sample because of these restrictions, and the resulting sample size was 8,131. No data restrictions were necessary for NYC HANES (n = 1,524).

Measures
The in-care population was defined as people who saw a health care provider in the previous year. In the CHS, in-care was defined as an affirmative response to the following 2 questions: "Do you have one person or more than one person you think of as your personal doctor or health care provider?" and "Have you seen your personal doctor or health care provider in the last 12 months?" In NYC HANES, in-care was defined as a response of one or more to the question "During the past 12 months, how many times have you seen a doctor or other health care professional?" and an affirmative response to the question "Were any of these visits in the past 12 months at a doctor's office or clinic for a checkup, advice about a health problem, or basic care?" Our sensitivity analysis included a variable to capture data on NYC HANES participants who had seen a health care provider from 1 to 3 years previously, defined as a response of "more than 1 year, but not more than 3 years ago" to the question "About how long has it been since you last saw or talked to a doctor or other health care professional about your health?" Independent variables of interest were demographics, health care coverage and access, health indicators, and chronic conditions. Demographic variables were age, sex, race/ethnicity, marital status, neighborhood poverty, employment status, education, whether born in the United States (50 states and District of Columbia), years in the United States (if foreign born), and interview language. Neighborhood poverty was calculated as the percentage of population in the participant's zip code living below 100% of the federal poverty level per the American Community Survey (ACS) 2008-2012 and was categorized as follows: less than 10% (low level of neighborhood poverty), 10% to 19%, 20% to 29%, and 30% to 100% (very high level of neighborhood poverty). Variables for health care coverage and access were having any health insurance, having Medicaid coverage (vs non-Medicaid coverage), and not obtaining needed medical care in the previous 12 months. Health indicators included self-rated health status, body mass index (BMI), smoking status, receiving an influenza vaccine in the previous 12 months, and receiving mental health treatment (medication or counseling) in the previous 12 months. BMI was based on self-reported height and weight in the CHS, and height and weight measurements were taken at the NYC HANES interview. We categorized BMI (kg/m 2 ) as underweight (<18.5), normal (18.5-24.9), overweight (25.0-29.9), obese (30.0-39.9), or extremely obese (≥40.0). A current smoker was defined as having smoked 100 or more cigarettes in his or her lifetime and a response of "every day" or "some days" to a question about current smoking frequency. Chronic condition variables were based on history, ie, whether a participant had ever been told he or she had depression, diabetes, hypertension, or hypercholesterolemia by a health care provider. For hypercholesterolemia, data were restricted to women aged 45 or older and men aged 35 or older to mirror the age-and sex-targeted recommendations for routine hypercholesterolemia testing by the US Preventive Services Task Force. Nonspecific psychological distress was defined as a Kessler 6 (K6) score of 7 to 24 (22). In NYC HANES, we considered additional prevalence variables (ie, "gold standard" definitions) for diabetes (hemoglobin A1c ≥6.5, or ever told diabetes and currently taking diabetes medication), hypertension (blood pressure ≥140/90 mm Hg, or ever told hypertension and currently taking hypertension medication), and hypercholesterolemia (total cholesterol ≥240 mg/dL, or ever told hypercholesterolemia and currently taking cholesterol medication) because biomarkers and information on medications were available.

Statistical analysis
We conducted bivariate analyses using Rao-Scott χ 2 tests to compare indicators in the CHS and NYC HANES by in-care status. We also conducted 2 sensitivity analyses. The first sensitivity analysis compared NYC HANES participants classified as in-care (ie, seen a health care provider in the previous year) with participants who had last seen a health care provider from 1 to 3 years previously, to determine if we could generalize the in-care population to people with more remote health care contact. Having health insurance is a major determinant of seeking care, and with the Affordable Care Act (ACA), the number of Americans with health insurance increased (23). To determine how maximal uptake of health insurance under the ACA and subsequent care might affect the characteristics of the in-care population, we conducted a second sensitivity analysis. This analysis used NYC HANES data and χ 2 tests to compare the demographics and health indicators of the in-care population and the uninsured not-in-care population. We also computed prevalence differences for health indicators between the in-care population and the uninsured not-in-care population combined with the in-care population.
All analyses were performed in SAS-callable SUDAAN (SAS version 9.2, SAS Institute; SUDAAN version 11.0.1, RTI International) to account for the complex sampling design. Estimates were weighted to the New York City population based on the ACS (2012 for CHS and 2013 for NYC HANES) and age-adjusted to the US 2000 standard population. Significance level was set at a 2sided α of .05.

CHS
According to the CHS, 71.7% (4,137,212) of adult New York City residents saw a health care provider in the previous year. We found significant differences in all demographic characteristics, insurance coverage, and health care access variables, except for Medicaid coverage, between the in-care and not-in-care populations (Table 1). In-care participants were more likely to be older (30.2% vs 10.6% for age ≥60 y), female (59.4% vs 42.0%), white non-Hispanic (39.5% vs 31.0%), married (44.3% vs 40.8%), born in the United States (51.7% vs 37.9%), residing longer in the United States if foreign born (76.8% vs 69.5% for ≥10 y), and having an English interview (80.6% vs 65.6%) than not-in-care participants. They were also more likely to reside in neighborhoods with the lowest levels of poverty (23.0% vs 17.2%), more likely to be unemployed or not in the labor force (41.8% vs 35.8%), a college graduate (36.4% vs 28.1%), and insured (90.9% vs 52.5%), and less likely to defer needed medical care (8.9% vs 17.0%) than not-in-care participants.
We also found significant differences in the health indicators and chronic conditions of the in-care and not-in-care populations (Table 2). The in-care population was more likely to have excellent (19.2% vs 17.8%) or very good (26.9% vs 22.9%) self-rated health and to be obese (20.6% vs 18.3%) or extremely obese (3.5% vs 2.9%). They were also less likely to be current smokers (14.9% vs 21.2%) and more likely to have received an influenza vaccine (47.3% vs 23.1%) and mental health treatment (14.2% vs 8.0%) in the previous 12 months. A significantly larger proportion of incare participants had a history of diabetes (12.5% vs 5.4%), hypertension (31.6% vs 21.8%), hypercholesterolemia (39.6% vs 23.8%), and depression (16.4% vs 13.4%). In-care participants were less likely to have mild, moderate, or severe nonspecific psychological distress (K6 score 7-24; 20.3% vs 25.6%).

NYC HANES
According to the 2013-2014 NYC HANES, 75.1% (4,701,244) of adult New York City residents saw a health care provider in the previous year. We found significant differences in age, sex, race/ ethnicity, marital status, employment status, health insurance status, and Medicaid coverage between the in-care and not-in-care populations (Table 1). Similar to the CHS in-care population, the NYC HANES in-care population was more likely to be older (27.1% vs 11.2% for age ≥60 y), female (57.2% vs 40.7%), married (44.1% vs 40.8%), unemployed or not in the labor force (41.0% vs 35.6%), and insured (89.2% vs 66.8%), but in contrast to the CHS, more likely to have Medicaid (28.0% vs 17.8%) than the not-in-care population.
We found significant differences between the 2 populations in health indicators ( Table 2). The in-care population was less likely to report excellent health (14.1% vs 22.4%), more likely to have received an influenza vaccine (47.6% vs 23.3%) and mental health treatment (19.2% vs 11.4%) in the previous 12 months, and more likely to have a history of diabetes (12.6% vs 4.8%), hypertension (32.5% vs 16.2%), or hypercholesterolemia (43.1% vs 20.7%). The populations did not significantly differ in BMI, smoking status, depression, or nonspecific psychological distress; however, the distribution of these variables in NYC HANES was similar to their distribution in CHS. Additionally, the in-care population had a higher prevalence of diabetes (16.7% vs 6.9%), hypertension (35.5% vs 26.4%), and hypercholesterolemia (35.7% vs 22.3%).
In a comparison of NYC HANES participants who were in care from 1 to 3 years previously with participants in care within the previous year, we found significant differences in demographics (Table 1), health indicators, and chronic conditions ( Table 2). The population in care from 1 to 3 years previously was more likely to be younger, male, non-Hispanic, unmarried, residing in neighborhoods with lower levels of poverty, employed, college graduates, born in the United States, and uninsured. Both populations significantly differed in health indicators and chronic conditions, and the variables for the population in care from 1 to 3 years previously were generally distributed similarly to those of the population not in care within the previous year.
In NYC HANES, 9.2% of the population reported being uninsured and not seeing any health care provider in the previous year. The demographic characteristics and health status of these people were significantly different from that of the in-care population. Compared with the in-care population, the uninsured not-in-care population was mostly younger than 60 years, male, white non-Hispanic, and living in poorer neighborhoods; had a lower prevalence of obesity, hypertension, hypercholesterolemia, and diabetes; and was less likely to have received an influenza vaccination in the previous 12 months. In a comparison of the in-care population and the uninsured not-in-care population combined with the incare population, the prevalence estimates for most health indicators differed by no more than 1.0 percentage point, with the exception of influenza vaccination (−3.3 percentage points) and hypercholesterolemia (−9.4 percentage points) (Table 3).

Discussion
We identified substantial differences between the in-care population and not-in-care population of New York City. In both surveys, the in-care population was disproportionately older, female, non-Hispanic, married, out of the labor force, more educated, insured, and not living in poor neighborhoods. The in-care popula-tion also had a higher prevalence of chronic diseases and obesity but was less likely to smoke than the not-in-care population.
These findings support our hypothesis that the in-care and not-incare populations in New York City have systematic demographic differences, and the in-care population is sicker. Our findings on differences in age, sex, marital status, and smoking status are consistent with studies conducted outside of New York City (7,8,24). Although some EHR-based surveillance studies have not generalized their data to an in-care population (14-19), our results suggest that at least in New York City and perhaps in other jurisdictions, the in-care population is the most appropriate population for generalizing EHR estimates because of the differences between the in-care and not-in-care populations.
In a comparison of findings from the CHS and NYC HANES, some variables were significantly different between populations in one survey but not in the other; however, the directionality and magnitude were similar for most of these variables. Some of the observed differences were probably attributable to differences in sample size between the 2 surveys (n = 1,524 in NYC HANES vs n = 8,131 in CHS), but there may also be real differences in sample characteristics attributable to differences in sampling frames (random-digit-dialed vs address-based), incentives and barriers to participation (financial compensation and specimen collection in NYC HANES), interview mode (telephone vs inperson), or the wording of the questions used to classify the incare population.
Our first sensitivity analysis revealed differences between people who saw a health care provider within the previous year and those who had their last health care contact from 1 to 3 years previously. The latter were generally more similar to the not-in-care population than they were to the population in care within the previous year, with the exception of more likely being non-Hispanic and born in the United States. This difference is important to consider not only for defining the optimal population for generalizing EHR estimates but also for defining patient inclusion criteria for the EHR cohort. These findings support the concept that the length of time since the most recent visit of patients sampled should parallel the definition of the population to which the prevalence estimates are generalized.
Our second sensitivity analysis assessed how maximal insurance uptake and health care utilization under the ACA might change the in-care population in New York City. A lower level of chronic disease was observed in the uninsured not-in-care population, and if this small group of people were to become insured and seek care, we would expect a minimal decline in the prevalence estimates of the in-care population of New York City. However, the results should be interpreted with caution because many possible reasons PREVENTING CHRONIC DISEASE VOLUME 13, E56 PUBLIC HEALTH RESEARCH, PRACTICE, AND POLICY APRIL 2016 The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. exist for why the eligible uninsured may not seek insurance under the ACA or why people do not seek care even if insured. Furthermore, these findings may vary by jurisdiction, depending on whether there is Medicaid expansion or not.
A major strength of this study was our use of data from 2 surveys that represented New York City's diverse adult population. Many of the same questions were asked in both surveys, allowing us to see how different survey methodologies may have influenced our results. Furthermore, because self-reported data are subject to bias (eg, recall, social desirability), the use of biomarkers in NYC HANES allowed us to objectively characterize BMI and chronic health conditions of the in-care and not-in-care populations, increasing confidence in our findings. Although the focus of our study was to inform the generalizability of EHR-based prevalence estimates, our data also offer important insights into urban health status and unmet need for primary care among people with chronic conditions. Nevertheless, our study has some limitations. The in-care populations examined in this study might have included people seeking primary care from nontypical primary care settings (ie, specialists), and our findings may be specific to the United States (or New York City) only.
The differences observed between the in-care and not-in-care adult populations of New York City in this study confirmed our preliminary decision to limit generalization of prevalence estimates generated by the NYC Macroscope to the in-care population of New York City. (The NYC Macroscope is a surveillance system that uses EHRs to track chronic conditions managed by primary care practices [www.nyc.gov/html/doh/html/data/ nycmacroscope.shtml]). Consequently, we are using data on age, sex, and neighborhood poverty distribution from the CHS in-care population to weight NYC Macroscope estimates. We validated 2013 NYC Macroscope prevalence estimates of smoking, obesity, depression, and influenza vaccination as well as data on the prevalence, treatment, and control of diabetes, hypertension, and hypercholesterolemia against in-care population estimates from the CHS and NYC HANES (25).
This study found significant differences between the in-care and not-in-care populations in New York City. Surveillance systems that use EHRs from primary care practices for monitoring chronic diseases should consider the intended purpose of the data collected and the systematic differences between in-care and not-in-care populations in the generalization of results.   Living with partner 6.4 (5.5-7.5) 9.8 (8.3-11.6) 8.0 (6.3-10.1) 9.9 (7.0-13.9) 10.3 (7. with "don't know") to the in-care questions and were dropped from the analyses. For the NYC HANES, 3 participants did not respond, so sample sizes in this row do not add to 1,524 (total sample size). g Proportion relative to the population in care within 3 years. h Non-Hispanic Pacific Islanders are categorized as "Asian non-Hispanic" in the Community Health Survey, and "other non-Hispanic" in NYC HANES. i Estimate should be interpreted with caution. Estimate's relative standard error (a measure of estimate precision) is greater than 30%, the 95% confidence interval half-width is greater than 10, or the sample size is too small, making the estimate potentially unreliable.   d Defined as a response of "more than 1 year, but not more than 3 years ago" to the question "About how long has it been since you last saw or talked to a doctor or other health care professional about your health?" e NYC HANES in care from 1 to 3 years previously vs NYC HANES in care within previous year. f Sample sizes for the Community Health Survey in this row do not add to 8,131 (total sample size) because 44 participants did not respond (refused or responded with "don't know") to the in-care questions and were dropped from the analyses. For the NYC HANES, 3 participants did not respond, so sample sizes in this row do not add to 1,524 (total sample size). g Proportion relative to the population in care within 3 years. h Non-Hispanic Pacific Islanders are categorized as "Asian non-Hispanic" in the Community Health Survey, and "other non-Hispanic" in NYC HANES. i Estimate should be interpreted with caution. Estimate's relative standard error (a measure of estimate precision) is greater than 30%, the 95% confidence interval half-width is greater than 10, or the sample size is too small, making the estimate potentially unreliable.  with "don't know") to the in-care questions and were dropped from the analyses. For the NYC HANES, 3 participants did not respond, so sample sizes in this row do not add to 1,524 (total sample size). g Proportion relative to the population in care within 3 years. h For CHS, based on self-reported height and weight; for NYC HANES, based on height and weight measurements taken at interview. Categorized (kg/m 2 ) as underweight (<18.5), normal (18.5-24.9), overweight (25.0-29.9), obese (30.0-39.9), or extremely obese (≥40.0). i Estimate should be interpreted with caution. Estimate's relative standard error (a measure of estimate precision) is greater than 30%, the 95% confidence interval half-width is greater than 10, or the sample size is too small, making the estimate potentially unreliable. j Current smoker defined as having smoked ≥100 cigarettes in his or her lifetime and a response of "every day" or "some days" to a question about the current smoking frequency. k Data restricted to women aged ≥45 years and men aged ≥35 years.
(continued on next page) d Defined as a response of "more than 1 year, but not more than 3 years ago" to the question "About how long has it been since you last saw or talked to a doctor or other health care professional about your health?" e NYC HANES in care from 1 to 3 years ago vs NYC HANES in care within previous year. f Sample sizes for the Community Health Survey in this row do not add to 8,131 (total sample size) because 44 participants did not respond (refused or responded with "don't know") to the in-care questions and were dropped from the analyses. For the NYC HANES, 3 participants did not respond, so sample sizes in this row do not add to 1,524 (total sample size). g Proportion relative to the population in care within 3 years. h For CHS, based on self-reported height and weight; for NYC HANES, based on height and weight measurements taken at interview. Categorized (kg/m 2 ) as underweight (<18.5), normal (18.5-24.9), overweight (25.0-29.9), obese (30.0-39.9), or extremely obese (≥40.0). i Estimate should be interpreted with caution. Estimate's relative standard error (a measure of estimate precision) is greater than 30%, the 95% confidence interval half-width is greater than 10, or the sample size is too small, making the estimate potentially unreliable. j Current smoker defined as having smoked ≥100 cigarettes in his or her lifetime and a response of "every day" or "some days" to a question about the current smoking frequency. k Data restricted to women aged ≥45 years and men aged ≥35 years.   (21). b Column percentages may not add up to 100% because of rounding. c Based on height and weight measurements taken at the NYC HANES interview. Categorized (kg/m 2 ) as underweight (<18.5), normal (18.5-24.9), overweight (25.0-29.9), obese (30.0-39.9), or extremely obese (≥40.0). d Current smoker defined as having smoked ≥100 cigarettes in his or her lifetime and a response of "every day" or "some days" to a question about the current smoking frequency. e Data restricted to women aged ≥45 years and men aged ≥35 years.