A Spatio-Demographic Perspective on the Role of Social Determinants of Health and Chronic Disease in Determining a Population’s Vulnerability to COVID-19

Introduction During the COVID-19 pandemic, health and social inequities placed racial and ethnic minority groups at increased risk of severe illness. Our objective was to investigate this health disparity by analyzing the relationship between potential social determinants of health (SDOH), COVID-19, and chronic disease in the spatial context of San Diego County, California. Methods We identified potential SDOH from a Pearson correlation analysis between socioeconomic variables and COVID-19 case rates during 5 pandemic stages, from March 31, 2020, to April 3, 2021. We used ridge regression to model chronic disease hospitalization and death rates by using the selected socioeconomic variables. Through the lens of COVID-19 and chronic disease, we identified vulnerable communities by using spatial methods, including Global Moran I spatial autocorrelation, local bivariate relationship analysis, and geographically weighted regression. Results In the Pearson correlation analysis, we identified 26 socioeconomic variables as potential SDOH because of their significance (P ≤ .05) in relation to COVID-19 case rates. Of the analyzed chronic disease rates, ridge regression most accurately modeled rates of diabetes age-adjusted death (R2 = 0.903) and age-adjusted hospitalization for hypertensive disease (hypertension, hypertensive heart disease, hypertensive chronic kidney disease, and hypertensive encephalopathy) (R2 = 0.952). COVID-19 and chronic disease rates exhibited positive spatial autocorrelation (0.304≤I≤0.561, 3.092≤Z≤6.548, 0.001≤P≤ .002), thereby justifying spatial models to highlight communities that are vulnerable to COVID-19. Conclusion Novel spatial analysis methods reveal relationships between SDOH, COVID-19, and chronic disease that are intuitive and easily communicated to public health decision makers and practitioners. Observable disparity patterns between urban and rural areas and between affluent and low-income communities establish the need for spatially differentiated COVID-19 response approaches to achieve health equity.


Introduction
During the COVID-19 pandemic, health and social inequities placed racial and ethnic minority groups at increased risk of severe illness. Our objective was to investigate this health disparity by analyzing the relationship between potential social determinants of health (SDOH), COVID-19, and chronic disease in the spatial context of San Diego County, California.

Methods
We identified potential SDOH from a Pearson correlation analysis between socioeconomic variables and COVID-19 case rates during 5 pandemic stages, from March 31, 2020, to April 3, 2021. We used ridge regression to model chronic disease hospitalization and death rates by using the selected socioeconomic variables. Through the lens of COVID-19 and chronic disease, we identified vulnerable communities by using spatial methods, including Global Moran I spatial autocorrelation, local bivariate relationship analysis, and geographically weighted regression.

Results
In the Pearson correlation analysis, we identified 26 socioeconomic variables as potential SDOH because of their significance (P ≤ .05) in relation to COVID-19 case rates. Of the analyzed chronic disease rates, ridge regression most accurately modeled rates of diabetes age-adjusted death (R 2 = 0.903) and age-adjusted hospitalization for hypertensive disease (hypertension, hypertensive heart disease, hypertensive chronic kidney disease, and hypertensive encephalopathy) (R 2 = 0.952). COVID-19 and chronic disease rates exhibited positive spatial autocorrelation (0.304≤I≤0.561, 3.092≤Z≤6.548, 0.001≤P≤ .002), thereby justifying spatial models to highlight communities that are vulnerable to COVID-19.

Conclusion
Novel spatial analysis methods reveal relationships between SDOH, COVID-19, and chronic disease that are intuitive and easily communicated to public health decision makers and practitioners. Observable disparity patterns between urban and rural areas and between affluent and low-income communities establish the need for spatially differentiated COVID-19 response approaches to achieve health equity.

Introduction
As the novel coronavirus spread throughout the US in early 2020, reports of health disparity challenged claims that COVID-19 was society's "great equalizer" (1,2). As of September 2021, non-Hispanic Black Americans, non-Hispanic American Indians, and Hispanic Americans experienced higher rates of COVID-19 infection (1.1, 1.7, 1.9 times higher, respectively), hospitalization (2.8, 3.5, 2.8 times higher, respectively), and death (2.0, 2.4, 2.3 times higher, respectively) than non-Hispanic White Americans (3). This observed health disparity stems from widespread structural discrimination and its effects on people of color.
Social determinants of health (SDOH) are socio-environmental conditions that dictate how people live and age, whereas differences in these conditions define socioeconomic status (SES) (4). Low SES is directly linked to poor health outcomes for communicable and noncommunicable diseases alike (5,6). In a study of COVID-19 outcomes in a New York City hospital, Black and Hispanic patients were more likely than White patients to present with comorbidities, such as cardiovascular disease or diabetes, that were strongly associated with mortality (7). Dr Anthony Fauci, the immunologist leading the US COVID-19 response, said that the comorbidities that negatively affect COVID-19 outcomes "relate to the social determinants of health dating back to disadvantageous conditions that some people of color find themselves in from birth" (8). Existing research confirms the associations between the disproportionate impact of COVID-19 and chronic disease in socially disadvantaged communities (6,9,10). The compounding effect of low SES, comorbidities, and COVID-19 demands immediate action to support communities vulnerable to COVID-19.
Our goal was to classify the relationships between COVID-19, chronic disease, and socioeconomic variables to promote localized public health policies. We used a spatially explicit modeling approach to meet our 2 study objectives: 1) to determine which socioeconomic variables, correlated with COVID-19 and chronic disease rates, are potential SDOH, and 2) whether spatial modeling of chronic disease rates can identify communities most vulnerable to COVID-19.

Study area
Our research area was San Diego County, a culturally diverse area well suited to investigation of the various effects of socioeconomic factors and chronic disease on population vulnerability to COVID-19. The county is located in southwestern California along the US-Mexico border. Its western portion is largely urban and densely populated, and its eastern portion lightly populated and rural. The county is divided into 41 subregional areas (SRAs), a geographic division frequently used to report COVID-19 and other health-related data.

Data collection
We obtained data sets from the San Diego County Open Data Portal (11), aggregated to SRAs, containing 2017 rates for hospitalization, emergency department discharge, and death per 100,000 residents for coronary heart disease (CHD), diabetes, hypertensive diseases (hypertension, hypertensive heart disease, hypertensive chronic kidney disease, and hypertensive encephalopathy), mental illness, and pulmonary disease. We included mental illness in our study because of the toll that COVID-19 has had on mental health (12) and because of the association between mental illness, other chronic diseases, and low SES (13,14).
Socioeconomic data related to age, race and ethnicity, language, housing, income, education, and employment were retrieved from the San Diego Association of Governments (SANDAG) Data Surfer (15) and the US Census Bureau's application programming interface (16). Data were then normalized by SRA population size or number of households. Along with socioeconomic variables, we included 4 health care access variables: health care clinics per SRA population, health care clinics per SRA square mile, hospitals per SRA population, and hospitals per SRA square mile. We calculated values for these health care access variables by using GIS analysis in ArcGIS Pro (Esri) and spatial data from SAND-AG.
The County of San Diego Health and Human Services Agency provided COVID-19 rates (17) and aggregated most of the rates to SRA. However, confirmed case rates had zip code aggregations. We converted these confirmed case rates (per 100,000 residents) to the SRA extent with a 2019 population-based crosswalk from SANDAG that used dasymetric techniques to determine the proportion of residents in each zip code that live within the boundaries of an SRA. A similar crosswalk was used to aggregate the US Census Bureau socioeconomic data from census tract to SRA. During Stage 1, the March 19, 2020, California stay-at-home order along with local restrictions enacted from March 29 through April 4, 2020 (eg, regarding face coverings, cruise ships) kept COVID-19 rates low and stable (19). Stage 2 covered San Diego County's first wave of increased COVID-19 rates, which followed the reopening of many of the county's businesses, between June 13 and June 25, 2020 (the indoor operation of some business sectors reclosed on July 3, 2020) (19). Stage 3 was a period of relative stability in response to additional public health restrictions that followed the first wave. Stage 4 was characterized by a second wave of dramatic rate surges, possibly related to gatherings for the 2020 Presidential election and winter holidays. A regional stay-athome order began on December 6, 2020, and continued through January 25, 2021 (19). Stage 5 was marked by steadily decreasing rates as the holiday season ended and county residents were vaccinated. By March 5, 2021, 1 million vaccines had been administered (19). Throughout all stages, COVID-19 confirmed case rates were highest in SRAs located in the southern portion of the county ( Figure 2). Although the pandemic continues, we stopped our analysis at the end of Stage 5 to analyze and interpret existing data.

Statistical methods
To address our first objective -to determine which socioeconomic variables, correlated with COVID-19 and chronic disease rates, were potential SDOH -we analyzed Pearson correlation coefficients, calculated with the SciPy Python package (SciPy-Python), to determine a set of potential SDOH from significant socioeconomic variables to the average confirmed daily COVID-19 case rates across the 5 pandemic stages. Socioeconomic variables were chosen for further analysis if the Pearson correlation P values were less than or equal to 0.05 for all stages, with 2 exceptions for variables with P values equal to 0.07 during 1 or 2 of the stages. The Pearson correlation coefficient is commonly used in medical research to test the strength of linear relationships between 2 variables (20). Next, we identified potentially meaningful relationships between COVID-19 and chronic disease comorbidities through a data-driven review of their Pearson correlation coefficients (18). We considered COVID-19 in the contexts of confirmed cases (total, and by race or ethnicity), total hospitalizations, and total deaths across the pandemic stages. For consistency, we selected a minimum of 1 rate, age-adjusted hospitalizations, for each of the chronic diseases.
To assess our potential SDOH, we conducted ridge regression analysis using a Python package, scikit-learn (Python), to evaluate how well the selected socioeconomic variables depicted actual distribution of COVID- 19  tuning parameter (α) and assigns coefficients to the explanatory variables to minimize the effects of the multicollinearity that is common among sociodemographic indicators (21). We chose the chronic disease rates with the most accurate ridge regression models for spatial analysis of COVID-19 case rates.
For our second objective, to determine whether spatial modeling of chronic disease rates can identify communities most vulnerable to COVID-19, we used 3 spatial techniques to model COVID-19 case rates and find vulnerable communities. Spatial autocorrelation (Global Moran I) tests of COVID-19 confirmed case rates and chronic disease rates assessed the overall appropriateness of spatial modeling. Spatial autocorrelation indicates the similarity of data values across space for a single variable, gauging whether data are clustered, dispersed, or randomly distributed (22). With local bivariate analysis and geographically weighted regression (GWR) modeling, we investigated the relationships between chronic disease rates (independent) and COVID-19 case rates (dependent). Local bivariate analysis tests for significant relationships between two variables within a spatial neighborhood (23). GWR is a regression technique that considers spatial nonstationarity and variable local relationships in the prediction model (24,25). We used Esri's ArcGIS Pro 2.8 software (Esri) to conduct the study's spatial analysis. Together, we synthesized the collective modeling and analysis results to propose links between COVID-19, chronic disease, and SDOH in the context of San Diego County.

COVID-19 correlations with potential SDOH and chronic disease
From an initial data set of 79 socioeconomic variables, 26 variables were recognized as potential SDOH because of their significant linear relationships (P ≤ .05) to COVID-19 case rates during all 5 stages (Table 1). Two extra variables were included in the subset because at least 1 P value was significant (P ≤ .05) during 1 of 5 stages: household income of $60,000 to $75,000 during Stages 1 (P = .07) and 2 (P = .07), and household income above $200,000 for Stage 5 (P = .07). We discovered that some of the variables in the socioeconomic variable subset exhibited multicollinearity, such as English and Spanish as home languages, White and Hispanic race or ethnicity, and various industries of employment.
In preparation for further evaluation of the socioeconomic variable subset, we reviewed Pearson correlation coefficients for 113 chronic disease rates and 85 COVID-19-related rates and identified important relationships between COVID-19 and comorbidities. The analyzed chronic disease rates (total, age-adjusted, by sex, by race or ethnicity, by age group) included hospitalizations, emergency department discharges, and deaths related to CHD, diabetes, hypertensive disease, mental illness, and pulmonary disease with sample sizes of 30 SRAs or more. Similarly, we considered rates of COVID-19 cases, hospitalizations, and deaths (total, ageadjusted, by sex, by race or ethnicity, by age group) in sample sizes of at least 30 SRAs. Ten of the most highly correlated rates, with at least 1 for each chronic disease, were selected for regression modeling: CHD age-adjusted hospitalization, diabetes ageadjusted hospitalization, diabetes age-adjusted death, diabetes hospitalization among patients aged 65 years or older, diabetes emergency department discharge among patients aged 65 years or older, age-adjusted hospitalization for people with hypertensive disease, hospitalization of Hispanic patients with hypertensive disease, mental illness age-adjusted hospitalization, pulmonary disease age-adjusted hospitalization, and pulmonary disease hospitalization of patients aged 65 years or older ( Table 2).
In general, highly positive correlations were observed for chronic disease and COVID-19 rates. Key temporal patterns included: Decreasing correlation coefficients between COVID-19 case rates among
Although ridge regression's regularization process limits interpretation of the effect of specific socioeconomic variables on the model, coefficients of greater magnitude (positive or negative) relative to the model run can generally be viewed as important in determining rates of COVID-19 and chronic disease. Variables corresponding to English or Spanish as home language and Hispanic ethnicity were consistently assigned coefficients of relatively high magnitude (Table 3).

Spatial analysis of COVID-19 and chronic disease
The COVID-19 case rates in the 5 stages, diabetes deaths, and hypertensive disease hospitalizations exhibited significant positive spatial autocorrelation (Global Moran I) indicating that rates geographically nearby tend to be similar. Of note, the strength of spatial autocorrelation decreased for COVID-19 case rates during pandemic Stage 1 (I = 0.561, z = 6.548, P ≤ .001) and Stage 2 (I = 0.485, z = 5.486, P ≤ .001) before stabilizing during Stages 3 through 5 (0.304 ≤ I ≤0.347, 3.511 ≤ z ≤3.934, P ≤ .001). Spatial autocorrelation results for 2017 hypertensive disease hospitalization rates (I = 0.413, z = 4.912, P ≤ .001) were greater than those for the 2017 diabetes death rates (I = 0.345, z = 3.092, P = .002). Subsequent spatial analysis determined the accuracy with which the rate of diabetes deaths or hypertensive disease hospitalizations could be independently used to model COVID-19 case rates, thereby avoiding the multicollinearity problems inherent in the selected socioeconomic variables.
Although diabetes death rates were well estimated by ridge regression by using the potential SDOH variables, data were suppressed for most of the lightly populated (rural) SRAs. Spatial analysis with the COVID-19 case rates produced interesting results, such as a linear bivariate relationship during all stages, but the reliabil-ity of our findings is challenged by the small sample size. Visualization of diabetes deaths and COVID-19 cases with layered quantile classes separated the urban portion of the county into 3 zones: high-high positive correlations to the south, low-low positive correlations in the center, and higher than expected COVID-19 cases in the north. Also, GWR standard residuals depict the emergence of a clear spatial pattern characterized by under-predictions along major transportation corridors to the south, over-predictions in the county's center, and under-predictions in the north.
Hypertensive disease hospitalization rates were available for all SRAs except Camp Pendleton, a military base in the northwest corner of the county. Visualization of the hypertensive disease hospitalization and COVID-19 case rates using layered quantile classification symbology showed a positive correlation, with several exceptions in northern SRAs, where northeast SRAs had higher hypertensive disease hospitalizations and northwest SRAs had higher COVID-19 cases ( Figure 3A). The local bivariate analysis confirmed this observation with linear positive relationships that, in southern SRAs, shifted to concave relationships over time (Figure 3B). GWR standard residuals (prediction errors) divided the county into overpredicted SRAs to the east and underpredicted (or accurately predicted) SRAs to the west ( Figure 3C). This demarcation roughly matches the county's rural-urban divide, although rural SRAs along the US-Mexico border were also underpredicted.

Discussion
Although the effect of socioeconomic factors on health equity is well established (5,8), spatial approaches are required to respond to known COVID-19 health disparities in regions of varied SES. We analyzed the relationships between socioeconomic variables, COVID-19, and chronic disease rates to identify a set of potential SDOH related to disproportionate disease spread. In a linear ridge regression model, variables across the categories of age, race and ethnicity, language, housing, income, education, and employment provide insight into the distribution of COVID-19. Reported health disparities related to race and ethnicity in San Diego County (27) are contextualized through the selection of related variables (eg, Hispanic ethnicity, Spanish home language) in the potential SDOH subset and their relative coefficient magnitudes during ridge regression. However, the highly related nature of the selected socioeconomic variables, such as high percentage of racial or ethnic minorities in lower-income neighborhoods (28), presents challenges to comprehensive spatial analysis.
As observed by others (7,29,30), people with preexisting chronic health conditions appear to be at increased risk of severe or fatal COVID-19 disease outcomes. As others have shown, in many cases those with an existing condition would not have died in the absence of a COVID-19 infection at the given time point (31). The strong correlations observed in our study are important in considerations related to limiting exposure for people with comorbidities, ensuring prompt vaccination to decrease biological susceptibility and providing prompt treatment if infected.
Because of the importance of comorbidities to COVID-19 outcomes and the observed correlations, we performed spatial modeling (GWR) of COVID-19 rates by using hypertensive disease hospitalization and diabetes death rates as explanatory variables. Not only can these comorbidity rates be well estimated by using the socioeconomic variables chosen to model COVID-19, but they also share similar spatial distributions to COVID-19, as determined through local bivariate analysis. Given these factors, the chronic disease rates should provide reasonable estimates of COVID-19 case rates. The GWR standard residuals indicate SRAs that have higher (underpredictions) or lower (overpredictions) COVID-19 case rates than expected by their comorbidity rates.
We propose that, in certain contexts, the GWR standard residuals highlight communities that are either notably vulnerable (underpredictions) or resilient (overpredictions) to COVID-19. When the PREVENTING CHRONIC DISEASE hypertensive disease hospitalization rate is used as the explanatory variable, differences between low-and high-population SRAs become apparent, delineating the county's rural-urban divide. When the diabetes death rate is the explanatory variable, urban subtleties reveal population vulnerabilities that can be further explained by socioeconomic variables and local area knowledge. However, because of suppressed values in the diabetes death rate data set, these findings require further investigation with additional data.
Through a spatial lens, the many interrelated factors that lead to vulnerability to COVID-19 can be better understood and clearly communicated to pandemic response decision makers and other involved planners. Spatially differentiated public health approaches are needed to overcome health disparity. The most effective policies for lightly populated communities will not work in densely populated areas. More importantly, culturally relevant and sensitive policies are needed to address COVID-19 in accordance with community demographics, preferences, and prevailing socioeconomic status. A disproportionately high number of COVID-19 cases in low-income communities might indicate low access to health care, poor communication of public health information, or unsustainable COVID-19 policies.
Our study had limitations. Data limitations posed major challenges. Health data are frequently aggregated to relatively large geographic units (ie, SRAs) and suppressed when rates are below a threshold, which ultimately resulted in a small number of large, varied areas to analyze. COVID-19 data scaled up from the zip code level are susceptible to errors related to the population-based conversion method and modifiable areal unit problem. Findings from our research are applicable only at the level of analysis and cannot be scaled down to make inferences about smaller geographic areas or individuals. Furthermore, because the temporal periods for data about the chronic disease rates (2017, annual) and COVID-19 case rates (2020-2021, 54-85 days) are not the same, uncertainties about variable correlations and temporal dependencies remain. Additional uncertainty relates to health care access in terms of who can, or will, get tested for COVID-19 or seek hospitalization and emergency services for chronic disease.
Limitations also exist in the analysis techniques used for our research. Although ridge regression regularization accommodates multicollinearity, the specific relationships between explanatory and dependent variables become obscured. In addition, our data and results suggest spatial dependency; thus, nonspatial linear models, such as ridge regression, are not reliable because they assume independence of data observations. The algorithms for neighborhood selection and prediction during the local bivariate analysis and GWR might introduce error due to varied SRA sizes. We expect that access to fine-scale data, enabling analysis with more features, would increase the accuracy of our models and enhance the overall value of the research.
Our analysis demonstrates the value of novel spatially informed approaches to COVID-19 responses and epidemiologic policy. Investigation of potential SDOH provides better understanding of the underlying reasons for COVID-19 and chronic disease distribution patterns. Socioeconomic variable analysis can help decision makers develop relevant pandemic response measures. Location unites different health and socioeconomic variables in support of clear communication about COVID-19, population vulnerability, and public health decisions. Spatial analysis is needed to develop effective policy targeted to diverse communities, such as those found in San Diego County.
Future research is needed to determine causal relationships between potential SDOH, COVID-19, and chronic disease. Access to fine-scale data and additional demographic and health care access variables, either in San Diego County or elsewhere, would permit the detailed analysis required to establish causal relationships between potential SDOH and health data. Our findings provide a basis for hypothesis formation and a framework for ongoing spatial analysis. The heterogenous nature of San Diego County is ideal for investigating how correlations differ across space and inspires ongoing research to address these differences. The promising spatial approaches discussed in this article benefit the continuing development of geographically diverse and socially equitable epidemiologic responses.