Geographic Variation in Obesity at the State Level in the All of Us Research Program

Introduction National obesity prevention strategies may benefit from precision health approaches involving diverse participants in population health studies. We used cohort data from the National Institutes of Health All of Us Research Program (All of Us) Researcher Workbench to estimate population-level obesity prevalence. Methods To estimate state-level obesity prevalence we used data from physical measurements made during All of Us enrollment visits and data from participant electronic health records (EHRs) where available. Prevalence estimates were calculated and mapped by state for 2 categories of body mass index (BMI) (kg/m2): obesity (BMI >30) and severe obesity (BMI >35). We calculated and mapped prevalence by state, excluding states with fewer than 100 All of Us participants. Results Data on height and weight were available for 244,504 All of Us participants from 33 states, and corresponding EHR data were available for 88,840 of these participants. The median and IQR of BMI taken from physical measurements data was 28.4 (24.4– 33.7) and 28.5 (24.5–33.6) from EHR data, where available. Overall obesity prevalence based on physical measurements data was 41.5% (95% CI, 41.3%–41.7%); prevalence of severe obesity was 20.7% (95% CI, 20.6–20.9), with large geographic variations observed across states. Prevalence estimates from states with greater numbers of All of Us participants were more similar to national population-based estimates than states with fewer participants. Conclusion All of Us participants had a high prevalence of obesity, with state-level geographic variation mirroring national trends. The diversity among All of Us participants may support future investigations on obesity prevention and treatment in diverse populations.


Introduction
National obesity prevention strategies may benefit from precision health approaches involving diverse participants in population health studies. We used cohort data from the National Institutes of Health All of Us Research Program (All of Us) Researcher Workbench to estimate population-level obesity prevalence.

Methods
To estimate state-level obesity prevalence we used data from physical measurements made during All of Us enrollment visits and data from participant electronic health records (EHRs) where available. Prevalence estimates were calculated and mapped by state for 2 categories of body mass index (BMI) (kg/m 2 ): obesity (BMI >30) and severe obesity (BMI >35). We calculated and mapped prevalence by state, excluding states with fewer than 100 All of Us participants.

Introduction
Efforts to address the growing obesity epidemic in the US may benefit from precision health approaches that use integrated data on environments, social determinants of health, health behaviors, clinical conditions, and genomic factors that contribute to risks in individuals and in diverse populations (1,2). Few cohorts have sufficient size or diversity of data types and populations needed to investigate the multiple potential contributors to obesity in the US. The National Institutes of Health's (NIH's) All of Us Research Program (All of Us) is designed to integrate multiple data types for research, with the goal of including data from 1 million people collected longitudinally over 10 years. Baseline assessments include in-person study visits, during which physical measurements are taken by trained study staff, including height and weight measurements (3). Clinical data, including height and weight measurements used during clinical encounters, are also collected from electronic health records (EHRs) of All of Us participants who consent to provide these data. Reports of behavioral, environmental, social, and demographic characteristics are collected through surveys administered to a diverse participant population. Biologic samples, including blood samples obtained via venipuncture, are obtained for biomarker and genomic studies.
Population-based data, such as results of the Behavioral Risk Factor Surveillance System (BRFSS), are available to study obesity in the US. BRFSS is a large, nationally representative, telephone-based survey of more than 400,000 participants conducted annually by state health departments to collect information on self-reported risk behaviors, chronic health conditions, and use of prevention services (4). Additionally, the National Health and Nutrition Examination Survey (NHANES) captures data annually on a nationally representative sample of approximately 5,000 participants and includes data from survey interviews, in-person physical measurements, and laboratory tests (5). BRFSS and NHANES have relative strengths and limitations for conducting obesity research at the population level. The large size of BRFSS enables monitoring of population-based obesity prevalence at the state level. However, measures of weight and height in BRFSS are obtained by self-report and may be subject to underestimating obesity because of self-reporting bias (1). The smaller NHANES study data are collected in an examination unit and thus provide objective physical measurements rather than self-reported data. However, NHANES data are designed to provide estimates that are nationally representative, but not representative of smaller geographic areas (1). To address these concerns, Ward et al generated state-level projections of obesity prevalence in BRFSS data that correct for self-reporting bias by using the distribution of obesity in NHANES as a correction factor (1). Data from All of Us may contribute additional value to these existing population-based resources because of the large size and nationwide distribution of the All of Us cohort for which objective measurements and biomarker data are available through in-person measurement, along with longitudinal data collected through EHRs that are not available in other cohort studies of this scale. Additionally, participant diversity within All of Us may provide insight into factors relevant to obesity risk in various social and geographic contexts and population strata in the US. More than 80% of All of Us participants belong to population groups that have been historically underrepresented in biomedical research, including people who are aged 65 or older, Black or Hispanic, have low income (annual income below the federal poverty level), less than a high school diploma or equivalent, diverse sexual orientation and gender identities, and rural residents (3).
To facilitate research, the All of Us Researcher Workbench was developed to provide access to integrated data types in the program. All of Us cohort data types include physical measurements, EHR data, surveys, and biospecimens. Data on height and weight from EHR and physical measurements data sources have not previously been reported. The goal of our study was to demonstrate the utility of the All of Us Researcher Workbench for examining obesity prevalence across the US in the All of Us cohort. First, our study validated the data on height and weight by examining the concordance of the 2 data sources for height and weight (physical measurements and EHR) in All of Us. Second, the study estimated state-level obesity prevalence among All of Us participants by sex in physical measurements data, and compared and contrasted our estimates with BRFSS data previously reported at the state level by Ward et al (1).

Methods
All of Us demonstration projects. We conducted our study from May 2018 through December 2020 (data collected in this range are date stamped March 8, 2021). The goals, recruitment methods, study sites, and scientific rationale for All of Us have been described previously (6). Demonstration projects were designed to describe the All of Us cohort and reproduce previous studies for validation purposes. Our study was proposed by members of the All of Us investigator consortium and reviewed and overseen by the program's science committee. Our analysis of deidentified data was classified as research not involving human subjects by the All of Us institutional review board. The initial release of data and tools used in our study was published recently (7). Results reported are in compliance with the All of Us Data and Statistics Dis-PREVENTING CHRONIC DISEASE VOLUME 18, E104 PUBLIC HEALTH RESEARCH, PRACTICE, AND POLICY DECEMBER 2021 The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.
semination Policy disallowing disclosure of results in group counts under 20.
All of Us Researcher Workbench. Our study used data available through the All of Us Researcher Workbench, a cloud-based platform where approved researchers can access and analyze All of Us data (6). The details of the surveys and methods of data collection are available in the Survey Explorer found in the All of Us Research Hub (https://www.researchallofus.org), a website designed to support researchers (8). Three currently available data types (survey, physical measurements, and EHR) are mapped to the common data model of the Observational Medical Outcomes Partnership, version 5.2, maintained by the Observational Health Data Sciences and Informatics collaborative (https://www.ohdsi.org/ data-standardization). To protect participant privacy, a series of data transformations were applied. These included data suppression of codes with a high risk of identification, such as military status; generalization of categories, including age, sex at birth, gender identity, sexual orientation, and race or ethnicity; and date shifting by a random (less than 1 year) number of days, implemented consistently across each participant record. Documentation on privacy implementation and creation of a curated data repository (CDR) is available in the All of Us Registered Tier CDR Data Dictionary (9). The Researcher Workbench currently offers tools with a user interface built for selecting groups of participants, creating data sets for analysis, and workspaces (Jupyter Notebooks; https://www.jupyter.org) to analyze data. The Notebooks enable the use of saved data sets and direct query by using R (R Project for Statistical Computing) and Python 3 (Python) programming languages. This demonstration project used the All of Us curated data set (CDR version fc-aou-cdr-prod.R2020Q4R2) on a secure server on March 8, 2021, by using a Researcher Workbench interface, version 4, which includes data released by the program in December 2020. Study population. Enrollment in All of Us began in May 2018, and the program currently enrolls participants aged 18 or older from a network of recruitment sites in more than 41 states. Enrollment will continue until at least 1 million participants are enrolled (3). All of Us is designed to recruit people who are underrepresented in biomedical research with the goal of enrolling its cohort from populations that are more than 75% underrepresented in terms of demographics, geographic location, and other characteristics, with at least 45% of participants coming from racial and ethnic groups that are underrepresented in research (3). Information on the sites from which participants are recruited has been described (6). Briefly, recruitment sites for All of Us were selected via an NIH submission and review process. The number of recruitment sites is evolving, and as of this writing, All of Us participants were enrolled at regional medical centers (93.6%), federally qualified health centers (3.2%), Veterans Health Administration sites (1.6%), and "direct volunteer" sites that can provide access for people who are not patients in a health care organization (a designated health clinic, blood bank, laboratory, or other facility) (1.6%). Participants in All of Us enroll digitally and provide informed consent to participate in the program through the website (https://www.joinallofus.org), via a smartphone application, or through one of the participating recruitment sites. After a person 1) consents to participate, 2) provides authorization to share EHR data, or 3) completes the initial baseline survey of demographic information, the participant becomes eligible for in-person visits to have physical measurements and biospecimens collected at one of the All of Us recruitment sites.
Data collection from in-person physical measurements and EHRs. Study protocols at each recruitment site were followed to measure objective height and weight during in-person visits. Height is measured via stadiometer and recorded in centimeters to the nearest millimeter. Weight is recorded in kilograms to the nearest 0.1 kg. Clinical data on height, weight, and calculated body mass index (BMI) (weight in kg/height in m 2 ) that were collected and recorded in participant EHRs during in-person clinical visits for routine patient care were extracted and transformed into the Observational Medical Outcomes Partnership common format at each enrollment site (7). For our analysis, height and weight values from physical measurements visits and EHR data were used to calculate BMI from both data sources. Our analyses also included survey data on demographics (sex and gender identity, education, race or ethnicity, age, and geographic location as US state of residence).
Statistical methods. We used the methods of Ward et al (1) to examine 2 categories of obesity consistent with Centers for Disease Control and Prevention (CDC) definitions of overall obesity (BMI ≥30) and severe obesity (BMI ≥35). We calculated obesity categories separately from physical measurements data only and EHR data only. Among those with both sources of data, we examined the correlation between the two with Pearson correlation coefficients. We examined baseline characteristics of participants from physical measurements data only and those who also contributed EHR data. We compared the characteristics of those with and without EHR data with χ 2 tests of significance for categorical variables. Sufficient data were available to display state-level obesity estimates by using All of Us BMI measurements calculated from physical measurements data, but at this writing, a sufficient sample of EHR data currently collected by All of Us was unavailable to display state-level estimates. We calculated the prevalence of obesity and severe obesity nationwide and for each state, overall and separately for men and women. We also calculated the prevalence of obesity or severe obesity in a complete-case analysis PREVENTING CHRONIC DISEASE for All of Us participants with known data on age, binary sex assigned at birth (male or female), race or ethnicity (Black, White, Hispanic, and other groups), height and weight, and education levels (<high school diploma or equivalent, high school diploma to some college, college graduate). Data from participants with nonbinary gender identities were not reported in these results because of small sample sizes. After deletion of participants with incomplete data, we compared the prevalence of obesity and severe obesity to reported BRFSS projections adjusted for self-report bias by Ward et al (1). We used ArcGIS version 10.7.1 (Esri) to map physical measurements of BMI data by state.
Exclusions and missing data. We excluded 9 states (Idaho, Iowa, Kentucky, Maine, Montana, New Hampshire, Ohio, Oklahoma, and Virginia) and the District of Columbia because they had fewer than 100 participants with height and weight data from physical measurements. We excluded individuals for whom information on the state of residence was not available or was suppressed to protect privacy. We included a total of 33 states (Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Florida, Georgia, Illinois, Indiana, Kansas, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Nevada, Nebraska, New Jersey, New Mexico, New York, North Carolina, North Dakota, Oregon, Pennsylvania, South Carolina, Tennessee, Texas, Utah, Washington, and Wisconsin) in the analysis. Other states and territories either had no All of Us participants or had participant information suppressed because of low participant numbers.
To select height measurements from EHR data, we used Standard Concept Names (body height, body height measured, or body height stated), or the Source Concept Names (height, body height, or body height stated). To select weight measurements, we used Standard Concept Names (body weight, dry body weight measured, body weight measured, or body weight stated) or Source Concept names (weight, body weight, or body weight stated) (10). As quality control of EHR height and weight measurements, we used the methods of Koebnick et al (11) describing a large young adult multiethnic cohort to guide our weight, height, and BMI exclusion criteria. We excluded inpatient and emergency department visits, visits during pregnancy, body weight below 30 lb or above 1,000 lb, height below 4 ft or above 7 ft, 2 in, and BMI <5 kg/m 2 or ≥100 kg/m 2 (11). These exclusion and inclusion criteria were used for both physical measurements and measurements taken from EHR data sources (11). For people with multiple physical measurements of height or weight, the average measurement was taken (to reduce measurement error) and the most recent physical measurement date was used. Only EHR height and weight measurements taken within 1 year of physical measurements were used. For participants with multiple EHR height and weight measurements, we took the EHR height/weight measurement on the date closest to the physical measurements visit date. We also excluded 166 participants whose weight numbers from physical measurements consistently deviated from their EHR weight numbers, suggesting a probable documentation error in physical measurements weight units. Data were analyzed in the All of Us Research Workbench Jupyter notebook by using R software version 4.0.2.

Results
Physical measurements data were available for 244,504 participants ( Table 1). The mean age of participants with physical measurements data in this study was 51.1 years. EHR data were available from 88,840 (36.3%) study participants with physical measurements data, with a mean age of 53.9 years. The overall median and IQR of BMI using physical measurements data was 28.4 (24.4-33.7). The overall median and IQR of BMI using EHR data was 28.5 (24. 5-33.6). Participants who contributed only physical measurements data and not EHR data were more likely to be male and underrepresented in biomedical research, including 26.3% who were non-Hispanic Black, 23.2% who were Hispanic, 13.6% who did not have a high school diploma or equivalent, and 50.4% who had a high school diploma or equivalent (Table 1). EHR data were less frequently available from study participants from states in the South and West/Pacific than other regions.
Obesity and severe obesity by geographic location and participant demographic characteristics. Because of sample size limitations we calculated the prevalence of obesity and severe obesity nationwide for each contributing state, overall and stratified by binary sex assigned at birth. The prevalence estimates for obesity (BMI >30) and severe obesity (BMI >35) using All of Us physical measurements data were 41.5% (95% CI, 41.3%-41.7%) and 20.7% (95% CI, 20.6%-20.9%) with large variations across states ( Table  2) (Figure 1) (Figure 2). Five states (Alabama, Connecticut, Mississippi, South Carolina, Tennessee) had overall obesity prevalence estimates greater than 50% (Table 2). Data from Connecticut were primarily collected from federally qualified health centers that serve as the All of Us hub in that state. Eight states (Alabama, Arkansas, Connecticut, Indiana, Mississippi, South Carolina, Tennessee, Texas) had severe obesity prevalence of 25% or greater ( Table 2).   Women in All of Us in each state had a higher prevalence of obesity than men except for 7 states: Kansas, Minnesota, Missouri, Nebraska, Nevada, New Jersey, and North Dakota ( Table 2). Women had a higher prevalence of severe obesity than men in all states except Nevada, New Jersey, and Oregon. The prevalence of obesity and severe obesity differed by race and ethnicity, education, and age, qualitatively reflecting nationwide patterns seen in BRFSS data (Table 3) (1). Correlation between physical measurements and EHR data. BMI data from physical measurements were highly correlated with BMI data from EHRs, with a Pearson correlation coefficient of 0.973 (95% CI, 0.972-0.973). Where data were available, the height and weight measurements from physical measurements and EHR data were similar and highly correlated with Pearson correlation coefficients >0.93 for height and >0.98 for weight in all subgroups and overall.

PREVENTING CHRONIC DISEASE
Geographic variation and comparison to existing BRFSS data projections. We compared state-level prevalence of obesity and severe obesity in the All of Us cohort with projections reported by Ward et al for the year 2020 calculated by using BRFSS survey data corrected for self-reporting bias (1). Seven states exhibited a 10% or greater absolute difference from state-level BRFSS projections for obesity prevalence (Connecticut, Louisiana, Maryland, Missouri, North Carolina, Oregon, Washington) and 4 states for severe obesity (Connecticut, Louisiana, Oregon, South Carolina). Larger variation in state-level prevalence estimates was significantly associated with smaller state sample sizes, with a Pearson correlation coefficient of −0.50 (95% CI, −0.72 to −0.19).

Discussion
Our examination of the All of Us cohort data found a high prevalence of obesity (41.5%) and severe obesity (20.7%), which was consistent with national projections for the year 2020, as calculated from BRFSS data by Ward et al, projecting obesity and severe obesity prevalence at 42.0% and 19.4% respectively (1). We found geographic differences in obesity and severe obesity in the All of Us cohort, with the highest prevalence of each condition in states in the Southeast. Binary sex, race and ethnicity, and education patterns in obesity and severe obesity were also observed to be similar to national data (1). Our state-level results are congruent with those of Ward et al, with estimates differing by 10% in places that contributed small sample sizes to the analysis (1). Data from in-person physical measurements and clinical data from EHRs were tightly correlated, providing a measure of concordance validating these data sources.
Our findings suggest features of the All of Us cohort that may be relevant to promoting health equity and precision health in obesity research. Compared with White groups, non-Hispanic Black and Hispanic groups had a higher prevalence of obesity and severe obesity (12). Our analysis shows that the All of Us cohort is diverse with respect to race and ethnicity, age, education, and categories of body weight, which may enhance studies examining PREVENTING CHRONIC DISEASE VOLUME 18, E104 PUBLIC HEALTH RESEARCH, PRACTICE, AND POLICY DECEMBER 2021 The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.
www.cdc.gov/pcd/issues/2021/21_0094.htm • Centers for Disease Control and Prevention risks and associations in multiple population subgroups with social exposures that vary by geographic location and other factors. This is important because inclusion of diverse populations and a focus on addressing social inequities are key components of a precision population health framework for obesity prevention to provide tailored population health and prevention strategies (13,14).
Our study benefits from standardized measurements of weight and height from in-person physical measurements as opposed to selfreported measurements. Data from EHRs and physical measurements were closely correlated, providing some measure of construct validity in data collection. Future studies using the All of Us cohort may benefit from linkages with clinical EHR data, survey data, and biomeasures to better understand genomic, clinical, environmental, and social contributors to obesity and related conditions.
Our study had several limitations related to this early stage of analysis of All of Us data. Several states had few participants (<100), and we found great variability in the sample recruited by each site, which leads to variability in the precision of the state-level prevalence estimate. Additionally, All of Us is not designed as a representative geographic sample of the US (3). For example, states such as Connecticut predominantly recruited participants from federally qualified health centers, who may have different participant demographic characteristics than the state at large. Thus, prevalence estimates from All of Us are not expected to track estimates from surveys designed to produce representative population-based statistics. Our analysis did find similar obesity prevalence estimates in this analysis of All of Us data compared with studies designed to produce population-based estimates at the state level, which suggests that the All of Us cohort may provide the diversity in health status and geography needed to support investigations of risk factors that will advance the prevention and treatment of obesity. However, some racial and ethnic groups were not represented in sufficient numbers for large-scale analyses. All of Us continues to build community partnerships to increase accessibility to the program for diverse groups. Related to data availability, as of December 2020, the All of Us Researcher Workbench had EHR data for height and weight from 36% of the cohort included in our analysis. An important limitation is that EHR data differed by demographics and geographic region. Additional efforts will be important to ensure equitable availability of clinical data for subgroups who do not contribute data because of structural inequities, preferences, and other factors.
Our study's ability to reproduce nationwide statistics was largely due to availability of physical measurements data obtained through All of Us. The ability to reproduce data may not be applicable to EHR data, which are not validated through medical re-cord review, patient interviews, or similar processes that formed the basis of a sensitivity analysis for our study. Validation of EHR-based data elements and patient phenotypes may be important for future studies.
In summary, our demonstration project using existing methods for estimating obesity in the All of Us cohort shows parallels in obesity estimates found in data using national probability samples. Our analysis suggests 3 important points: 1) that All of Us has captured significant diversity within the US along the lines of binary sex, race and ethnicity, and age; 2) that the data show good internal consistency between physical measurements and EHR data and good external validity when compared with a second population-level study; and 3) that the data may have sufficient geographic spread to be useful for population-health studies and individual-level studies of contributors to obesity (1). As All of Us continues enrollment and adds data types (including genomics and biomarkers), additional studies are warranted to continue to monitor the applicability of the data to diverse populations. Abbreviation: BMI, body mass index. a Physical measurement data among states with at least 100 participants. b BMI = weight in kg ÷ height in m 2 . c Participants for whom data are not reported separately because of small sample sizes. These include those responding none of these, Asian, more than one race or ethnicity, or another single race or ethnicity.