Alternative Methods for Grouping Race and Ethnicity to Monitor COVID-19 Outcomes and Vaccination Coverage

Population-based analyses of COVID-19 data, by race and ethnicity can identify and monitor disparities in COVID-19 outcomes and vaccination coverage. CDC recommends that information about race and ethnicity be collected to identify disparities and ensure equitable access to protective measures such as vaccines; however, this information is often missing in COVID-19 data reported to CDC. Baseline data collection requirements of the Office of Management and Budget's Standards for the Classification of Federal Data on Race and Ethnicity (Statistical Policy Directive No. 15) include two ethnicity categories and a minimum of five race categories (1). Using available COVID-19 case and vaccination data, CDC compared the current method for grouping persons by race and ethnicity, which prioritizes ethnicity (in alignment with the policy directive), with two alternative methods (methods A and B) that used race information when ethnicity information was missing. Method A assumed non-Hispanic ethnicity when ethnicity data were unknown or missing and used the same population groupings (denominators) for rate calculations as the current method (Hispanic persons for the Hispanic group and race category and non-Hispanic persons for the different racial groups). Method B grouped persons into ethnicity and race categories that are not mutually exclusive, unlike the current method and method A. Denominators for rate calculations using method B were Hispanic persons for the Hispanic group and persons of Hispanic or non-Hispanic ethnicity for the different racial groups. Compared with the current method, the alternative methods resulted in higher counts of COVID-19 cases and fully vaccinated persons across race categories (American Indian or Alaska Native [AI/AN], Asian, Black or African American [Black], Native Hawaiian or Other Pacific Islander [NH/PI], and White persons). When method B was used, the largest relative increase in cases (58.5%) was among AI/AN persons and the largest relative increase in the number of those fully vaccinated persons was among NH/PI persons (51.6%). Compared with the current method, method A resulted in higher cumulative incidence and vaccination coverage rates for the five racial groups. Method B resulted in decreasing cumulative incidence rates for two groups (AI/AN and NH/PI persons) and decreasing cumulative vaccination coverage rates for AI/AN persons. The rate ratio for having a case of COVID-19 by racial and ethnic group compared with that for White persons varied by method but was <1 for Asian persons and >1 for other groups across all three methods. The likelihood of being fully vaccinated was highest among NH/PI persons across all three methods. This analysis demonstrates that alternative methods for analyzing race and ethnicity data when data are incomplete can lead to different conclusions about disparities. These methods have limitations, however, and warrant further examination of potential bias and consultation with experts to identify additional methods for analyzing and tracking disparities when race and ethnicity data are incomplete.

Population-based analyses of COVID-19 data by race and ethnicity can identify and monitor disparities in COVID-19 outcomes and vaccination coverage. CDC recommends that information about race and ethnicity be collected to identify disparities and ensure equitable access to protective measures such as vaccines; however, this information is often missing in COVID-19 data reported to CDC. Baseline data collection requirements of the Office of Management and Budget's Standards for the Classification of Federal Data on Race and Ethnicity (Statistical Policy Directive No. 15) include two ethnicity categories and a minimum of five race categories (1). Using available COVID-19 case and vaccination data, CDC compared the current method for grouping persons by race and ethnicity, which prioritizes ethnicity (in alignment with the policy directive), with two alternative methods (methods A and B) that used race information when ethnicity information was missing. Method A assumed non-Hispanic ethnicity when ethnicity data were unknown or missing and used the same population groupings (denominators) for rate calculations as the current method (Hispanic persons for the Hispanic group and race category and non-Hispanic persons for the different racial groups). Method B grouped persons into ethnicity and race categories that are not mutually exclusive, unlike the current method and method A. Denominators for rate calculations using method B were Hispanic persons for the Hispanic group and persons of Hispanic or non-Hispanic ethnicity for the different racial groups. Compared with the current method, the alternative methods resulted in higher counts of COVID-19 cases and fully vaccinated persons across race categories (American Indian or Alaska Native [AI/AN], Asian, Black or African American [Black], Native Hawaiian or Other Pacific Islander [NH/PI], and White persons). When method B was used, the largest relative increase in cases (58.5%) was among AI/AN persons and the largest relative increase in the number of those fully vaccinated persons was among NH/PI persons (51.6%). Compared with the current method, method A resulted in higher cumulative incidence and vaccination coverage rates for the five racial groups. Method B resulted in decreasing cumulative incidence rates for two groups (AI/AN and NH/PI persons) and decreasing cumulative vaccination coverage rates for AI/AN persons. The rate ratio for having a case of COVID-19 by racial and ethnic group compared with that for White persons varied by method but was <1 for Asian persons and >1 for other groups across all three methods. The likelihood of being fully vaccinated was highest among NH/PI persons across all three methods. This analysis demonstrates that alternative methods for analyzing race and ethnicity data when data are incomplete can lead to different conclusions about disparities. These methods have limitations, however, and warrant further examination of potential bias and consultation with experts to identify additional methods for analyzing and tracking disparities when race and ethnicity data are incomplete.
To improve monitoring of COVID-19-associated outcomes among racial and ethnic groups, CDC used three methods for grouping persons by race and ethnicity to analyze the following six indicators: 1) COVID-19 case counts, 2) cumulative incidence, 3) rate ratios for COVID-19 infection, 4) number of fully vaccinated persons, 5) cumulative vaccination coverage rates, and 6) rate ratios for being fully vaccinated. The method for grouping race and ethnicity used by CDC (current method) begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with reported race and non-Hispanic ethnicity as race category, non-Hispanic (which excludes persons with missing or unknown ethnicity and those with non-Hispanic ethnicity and missing or unknown race). The current method was compared with two alternative methods (methods A and B) that have been used previously (2,3). Method A first groups persons based on Hispanic ethnicity (as with the current method) and then groups persons with known race and non-Hispanic ethnicity or unknown or missing ethnicity as race category, non-Hispanic (persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded). Method B groups all persons with Hispanic ethnicity as Hispanic, regardless of race, and persons with reported race and Hispanic, non-Hispanic, unknown, or missing ethnicity are grouped by race category; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. Notably, with method B, the groups are not mutually exclusive (Box).
Daily confirmed COVID-19 cases in the United States during January 1, 2020-May 31, 2021, were obtained from CDC's case-based surveillance system.* Daily data about COVID-19 vaccine doses administered in the United States during December 14, 2020-May 31, 2021, including full vaccination status, were collected by vaccination providers and reported to CDC by multiple sources. † In the case and vaccination data sent to CDC, race was reported as White, Black, AI/AN, Asian, NH/PI, more than one race, other race, unknown race, or missing race. Ethnicity was reported as Hispanic or Latino (Hispanic), non-Hispanic, unknown ethnicity, or missing ethnicity. COVID-19 incidence and vaccination coverage rates were calculated using the 2019 U.S. Census Bureau's annual resident population estimates. § The current method and method A used the same population groupings (denominators) for rate calculations (Hispanic persons for the Hispanic group and race category, non-Hispanic persons for the different racial groups). Method B denominators were Hispanic persons for the Hispanic group and persons of Hispanic or non-Hispanic ethnicity for the different racial groups. Rate ratios were used to compare relative differences in COVID-19 incidence and full vaccination coverage rates between racial and ethnic groups.
The comparator for the current method and method A was White, non-Hispanic persons and for method B was White persons. This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy. ¶ During January 1, 2020-May 31, 2021, U.S. states and four territories reported 26,724,149 COVID-19 cases to CDC. Among these reports, information on race, ethnicity, or both was missing from 26.7%, 35.2%, and 21.7% of reports received, respectively. During December 14, 2020-May 31, 2021, based on vaccine administration data reported to CDC, 126,692,891 fully COVID-19-vaccinated persons were reported in the United States; information on race, ethnicity, or both was missing from 23.1%, 31.7%, and 19.5% of these reports, respectively.

Summary
What is already known about this topic?
Analyses of race and ethnicity in COVID-19 data to identify and monitor disparities are complicated by missing or unknown data.
What is added by this report?
Methods that use more race information when ethnicity information is missing resulted in higher estimated COVID-19 case counts, incidence, and vaccination coverage for most racial groups studied; however, these methods have limitations and warrant further examination of potential bias.
What are the implications for public health practice?
Ongoing work with experts is needed to identify methods for optimizing race and ethnicity data when data are incomplete. Multiple data sources are needed to monitor disparities and continued efforts are needed to strengthen the reporting of these data, consistent with CDC's Data Modernization Initiative.
Among persons of Hispanic ethnicity, the numbers of COVID-19 cases and persons fully vaccinated, and population incidence and vaccination coverage rates were the same across the three methods for grouping race and ethnicity (Table  1). Methods A and B resulted in more COVID-19 cases and fully vaccinated persons assigned to a racial group compared with the current method because of the inclusion of persons with unknown or missing ethnicity information. Compared with the current method, method A resulted in case counts that were 16.6% to 37.2% higher across race groups, with the largest relative increase in the AI/AN, non-Hispanic group (37.2%). For method B, for which racial and ethnic groups were not mutually exclusive, the percentage increase in case counts compared with the current method ranged from 25.7% to 58.5% among the five race categories. The largest relative increase in case counts was in the AI/AN group (58.5%); case counts in White persons also increased (45.1%). The estimated population incidence of COVID-19 varied depending on the classification method used. Compared with the current method, method A resulted in higher cumulative COVID-19 incidences among the five racial groups, with the largest increase among AI/AN, non-Hispanic persons (37.2%). Method B resulted in increased cumulative incidence among Asian persons (21.8%), Black persons (19.6%) and White persons (14.3%), and slight decreases among AI/AN persons (7.9%) and NH/PI persons (1.0%).
Compared with the current method, method A resulted in higher numbers of fully vaccinated persons across all racial groups, ranging from 17.8% (non-Hispanic Asian) to 37.3% (non-Hispanic NH/PI) higher. Method B resulted in 19.4% to 51.6% higher numbers of fully vaccinated persons across the racial groups, with the largest relative increase among NH/PI persons (51.6%). Full vaccination coverage also varied  reported as multiracial or other race with non-Hispanic, unknown, or missing ethnicity (13,859,910; 10.9%). † † Current method begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with reported race and non-Hispanic ethnicity as race category, non-Hispanic; persons with missing or unknown ethnicity and those with non-Hispanic ethnicity and missing or unknown race are excluded. § § Method A begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with known race and non-Hispanic or unknown or missing ethnicity as race category, non-Hispanic; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. ¶ ¶ Method B groups all persons with Hispanic ethnicity as Hispanic, regardless of race, and persons with reported race and Hispanic, non-Hispanic, unknown, or missing ethnicity are grouped by race category; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. Groups are not mutually exclusive.
depending on the racial and ethnic classification method used. Compared with the current method, method A resulted in higher numbers of fully vaccinated persons per 100,000 for all racial groups, with the largest increase among non-Hispanic NH/PI persons (37.3%). Method B resulted in coverage increases among all racial groups except AI/AN persons, among whom a 23.7% decrease occurred. When the current method was used, Hispanic and non-Hispanic NH/PI persons were twice as likely as non-Hispanic White persons to have COVID-19 (Table 2). When method A was used, the rate ratio was highest for non-Hispanic AI/AN (1.76) and non-Hispanic NH/PI (1.84) persons; when method B was used, the rate ratio relative to White persons was highest among Hispanic persons (1.72) and NH/PI persons (1.72). Among Asian persons, the rate ratio for COVID-19 was lower across all three methods (0.66-0.71). NH/PI persons had the highest likelihood of being fully vaccinated when the current method (1.70), method A (1.97), and method B (1.92) were used compared with each method's reference group.

Discussion
Estimation of COVID-19 incidence and vaccination coverage by race and ethnicity is complicated by missing data. Previous studies have proposed methods for classifying race and ethnicity to address such complexities as multirace responses, but these methods do not consider missing data  2021. Persons who were reported as multiracial or other race with non-Hispanic, unknown, or missing ethnicity (13,859,910; 10.9%) were excluded from the analyses. Texas does not report vaccine counts by race and ethnic group and was excluded. ¶ Current method begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with reported race and non-Hispanic ethnicity as race category, non-Hispanic; persons with missing or unknown ethnicity and those with non-Hispanic ethnicity and missing or unknown race are excluded. ** Method A begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with known race and non-Hispanic or unknown or missing ethnicity as race category, non-Hispanic; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. † † Method B groups all persons with Hispanic ethnicity as Hispanic, regardless of race, and persons with reported race and Hispanic, non-Hispanic, unknown, or missing ethnicity are grouped by race category; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. Groups are not mutually exclusive.
in circumstances such as a public health emergency in which real-time monitoring and action are needed to identify and address disparities (4,5). The alternative methods used in this study (methods A and B) resulted in the analyses of more data by race, which increased estimates of COVID-19 case counts, incidence, and vaccination coverage among most racial groups. The current method, used by CDC, and method A resulted in mutually exclusive racial and ethnic groups. The denominators for rate calculations are either persons reported as Hispanic or persons reported as a race category and non-Hispanic, with an assumption in method A that persons for whom missing ethnicity data were missing are non-Hispanic. Method A is more commonly used when ethnicity is missing from a small percentage of records and other information in the record supports a non-Hispanic designation. When approximately one-third of records are missing ethnicity, as in this report (35% for case and 32% for vaccination coverage data), that assumption might attenuate or amplify disparities for certain groups. With method B, the race and ethnicity groups are not mutually exclusive. This complicates comparisons that use a reference group (often White persons), because the race and ethnicity categories overlap. The findings in this report are subject to at least four limitations. First, because the analysis did not include persons who identified as multiple races or other race, conclusions cannot be drawn about the use of the alternative methods for grouping and analyzing these racial categories. Second, this report did not explore all possible analytic methods for grouping race and ethnicity. For example, imputation (i.e., replacing missing data with other values) has been examined as a potential method to improve estimates of COVID-19 racial and ethnic disparities (6). Third, data shared with CDC might undercount COVID-19 cases and vaccination coverage and this undercount might differ by race or ethnicity. Finally, although progress has been made to incorporate the Office of Management and Budget standards (such as Statistical Policy Directive No. 15) into the collection and presentation of race and ethnicity data, some data collection efforts still do not fully use this guidance (7).
Although race and ethnicity are not the only measures for assessing health disparities, these measures have been integral to CDC's understanding of the health outcomes associated with COVID-19 (8)(9)(10). This analysis demonstrates that alternative methods for analyzing race and ethnicity data when data are incomplete can lead to different interpretations about disparities and highlights the importance of working with experts to identify methods for analyzing and tracking disparities when race and ethnicity data are incomplete. CDC uses multiple data sources to monitor disparities in COVID-19 outcomes and will continue to optimize the available data and work with jurisdictions to strengthen reporting of these data consistent with CDC's COVID-19 Response Health Equity Strategy ** and Data Modernization Initiative. † †