Installation Instructions and User Guide for ArcGIS Pro

*** Files needed for exercise: NC_HeartDisease0608.dbf, and North Carolina census tract shapefile: NC_tracts_2010prj.shp

Goal: Use the Rate Stabilizer Tool (RST) to produce easily mapped age-standardized, smoothed sub-county estimates.

Objectives:

  • Gain experience installing the tool and managing the user interface;
  • Develop an understanding of the required data inputs; and
  • Interpret and map the output

Problem: You have been provided with individual level heart disease death data for the years 2006 to 2008 that have been geocoded to the US 2010 Census tract level. Using the Rate Stabilizer Tool (RST), you will generate age-standardized, smoothed rates. Using the information generated by the RST, you will be able to map tract level death rates for heart disease for the state of North Carolina.

Obtain a US Census API key

  1. Go to Census Key Sign up site: https://api.census.gov/data/key_signup.htmlexternal icon
The Request a Key window where you enter your organization name and email address.

Fill all blanks; in a few minutes you should receive an e-mail like this:

The Hello! window from the Census Bureau API Team with the API key showing.

Activate your key and you are ready to use the Census API. You will need to use this key after you have downloaded the RST toolbox.

Loading the toolbox

  1. Download the tool.
    1. Save the zipped file to your workspace folder
    2. Extract zipped file to your workspace folder
  1. This tool has been built using Python and can be added as a toolbox into the ArcGIS Desktop interface.
  2. Add your API key.In your workspace folder RST folder, use notepad to open fetch_data.py
Explorer window open with the fetch data PY file highlighted.

In line 14, paste the API key that you received from the US Census.

Fetch data window with the api key highlighted.

Close the fetch_data.py and save your changes.

  1. Create a new map project in ArcGIS Pro.
Create a new project.
  1. Activate the Catalog pane by left clicking on the View tab on the top ribbon and click on Catalog Pane.
View tab with Catalog pane highlighted.
  1. Right click on the Toolboxes icon again at the top of the Catalog and select Add toolbox.
Toolboxes is chosen, then Add Toolbox is highlighted.
  1. Navigate to your workspace location (where you extracted the toolbox). RateStabilizerTool_v2.20_ArcGISPro> RateStabilizerTool.tbx and click OK.
The RateStabilizertool.tbx open to show the files it contains.
  1. The RateStabilizerTool toolbox will now appear; expand the toolbox to reveal all of the Script-based tools in this toolbox.
Added Data is chosen, and the shapefile is showing.

Adding your data

  1. Select Add Data in the Map Ribbon. The 20_ArcGISPro TestData folder contains a shapefile: NC_tracts_2010prj.shp. Add this shapefile for some geographical context.
Added Data is chosen, and the shapefile is showing.

View of the shapefile image.
  1. Next add the record level heart disease death data: NC_HeartDisease0608.dbf
View of the dBase table.

Right click on the table name to view this table and take a look at the attribute fields:

OID – an automatically generated object identifier;

OBJECTID – automatically generated unique identifier

GEOID10 – the geographic identifier for the US Census tract unit that you will be calculating a stabilized age-standardized rate, this is a required field for the tool;

DDR_ID – the unique-identifier for each record;

Yr_Death – the year of the death record;

Age – the age at which the individual represented by the record died, this is also a required field for the tool;

Male – the sex of the decedent, with 1 indicating male and 0 indicating female (to be used in a future version of the tool); and

GEOIDCNTY – alternatively, the geographic identifier for the US Census county unit can also be used to calculate a stabilized age-standardized rate

Using the tool

This tool makes use of the US Decennial Census API to obtain census population data for age-adjustment calculation.

First run the Fetch Population Age to File tool in the open environment to download the latest US Census data from the US Decennial Census API. Next, run the Rate Stabilizing Tool with Local Data tool in a secure environment – pointing to the US Census data saved data in a secure location (behind a firewall for example).

  1. First run the (Step 1) Fetch Population by Age to File Script tool. Double left click on the icon to open the tool.
Step 1, Fetch Population by Age to File is highlighted.
  1. Let’s consider each of the parameters/inputs.
    1. Output folder: The output folder location for the results. Select your workspace folder as the output folder and then click “Ok” on the lower right side of the dialog box
    2. Year for Standard Population: To standardize for age, you need to select a standard population. A standardized rate weight will be generated from the standard population. Select 2010 as the desired year of the standard age structure for the entire USA.
    3. State: Select North Carolina as the state for which population data will be used to generate crude rates.
    4. Year of base population in Research Area: Select 2010 since our data was collected from 2006 – 2008. 2010 is the year of the chosen state’s base population that will be used for calculating crude rates.
    5. Geographic Level of Study: Select US Census Tract as the desired geographic level of analysis for the population data.
Parameter fields for Step 1.

Click Run to run the tool… and expand to show messages. You should see this:

The data file RawData_state37_tract.data is highlighted.

Once the tool has successfully run (the dialogue box will say “completed script” and “succeeded”), it will generate a data file that can be used to run the tool behind your firewall. In our case, the file will be named: RawData_state37_tract.data. This file can be re-used for multiple analyses.

Step 2 Optional, Build Neighborhood Dictionary. Parameter fields are showing.
  1. The RST tool can optionally perform spatial Bayesian smoothing when given a neighborhood dictionary. Run the (Step 2 Optional) Build Neighborhood Dictionary Script tool to generate a matrix of adjacency relationships between geographic units. Double left click on the icon to open the tool.
  2. Let’s consider each of the parameters/inputs for the tool.
    1. Shapefile for Spatial Bayesian: Select a shapefile containing all the geographic units at the level of study for the state of interest. The neighborhood dictionary will be built based on this input shapefile – select: NC_tract_2010prj.shp shapefile.
    2. GeoID in shapefile: Index the field name containing each geographic unit’s GeoID. In order to generate a valid neighborhood dictionary, you need a field containing a unique identifier for each census geography that aligns with the census ID; select– GEOID10.
    3. Output folder: The output folder location for the results. Single click your workspace folder and then click “Add” on the lower right side of the dialog box.
Screenshot of Step 2 and the script running in the background.

Click Run to run the tool. You should see something like this:

Screenshot of Step 2 and the script running in the background.

After the tool has successfully run, it will generate a new neighborhood_dict.data file. In our case, the file will be named: NC_tract_2010prj_neighborhood_dict.data.

The data file NC_tract_2010prj_neighborhood is highlighted.
  1. Now that we have the population data from the US Census to manage a proper age adjustment tool locally and a neighborhood dictionary for optional spatial smoothing, we can run the (Step 3) Rate Stabilizing with Local Data Script tool. Double left click on the icon to open the tool.
  2. Let’s consider each of these parameters/inputs for the tool.
    1. Input de-identified point level death data: Select your records table for calculation. The age and GEOID fields are required for the calculation. Please de-identify the point level data before using the tool.
      1. Browse to the NC_HeartDisease0608.dbf file.
    2. Output folder: Select the output folder location for the results.
      1. Please single click your desired folder and then click “Add” on the lower right side of the dialog box.
    3. Geographic Identification Field: Index the field name containing the GeoID’s.
    4. Age field: Index the field name of the patients’ age.
    5. Downloaded Raw Data: Point the tool to the folder that you have placed the downloaded Census data (data).
      1. Remember in generating this data you have:
        1. Selected Year for standard US population 2010. In order to adjust for age, a standard population must be selected. The adjusted weight will be generated from standard population.
        2. Selected the State of interest: NC or 37.
        3. Selected the Year of base population 2010 for NC.
        4. Specified the geographic level of study: Tract.
    6. Age Structure: Specify your own age group structure. Enter the lower-bound of each age group. One negative value can be used to control the max age of consideration.
      1. For example, for the following group structure:
        1. Younger than 1 years,
        2. 1-5 years,
        3. 6-20 years,
        4. 21-45 years,
        5. 46-60 years,
        6. Over 61 years
    7. The following numbers should be entered: 0, 1, 6, 21, 46, 61 (pressing enter to add the next value).
      1. For an age group structure:
        1. Younger than 5 years,
        2. 5-10 years,
        3. 10-17 years
    8. The following numbers should be entered: 0, 5, 10, -18.
    9. Number of Years in data: Enter in the total number of years represented by the input death data. Decimals can be used to represent partial years.
      1. For our example— enter 3.
    10. I’m not analyzing the entire state: Check this if your input data only reflects a portion of the state’s geographies rather than the entire state. Why? If we tell the tool that the base population is for the entire state, it will smooth to that population. If the data only comes from select counties, we only want to consider the base population for these counties combined.
      1. For our example, uncheck this box since we are analyzing data from every county in North Carolina.
    11. Neighborhood Dictionary for Spatial Bayesian (optional): If you would like to include spatial Bayesian smoothing, select the neighborhood dictionary data file built in Step 2. Otherwise, leave blank.
      1. Navigate to and select the NC_tract_2010prj_neighborhood_dict.data file and select “Open”.

Here is what your completed test run of the tool should look once you have input the parameters:

The data file NC_tract_2010prj_neighborhood is highlighted.

Click Run to run the tool.

Let’s talk about what is going on while the script is running:

  • The tool is using the Census data file you created to calculate population in each age group and the standard age structure.
  • The tool is calculating crude rates for each age category you specified. Once these are calculated, age-standardized rates (both smoothed and non-smoothed) are calculated for each age category and standard age structure.
  • If there are GeoIDs (census Identifiers in this case) that are missing, or cannot be recognized in census data, you will receive a warning.
  • All output file paths can be viewed under the Messages section in the Analysis > History Pane.
Analysis tab open with the History button highlighted.

 

Window showing the history of all of steps taken.

Understanding the output

  1. Once the tool has completed all processes, 3 tables will be places in the Output folder you specified. These include:
    1. Two intermediate tables required for calculations which can be used for verification:
      1. PopAge_structure_state37.csv: This file includes the population for each geographic unit in the state of interest by age category. You defined these age categories when you set up the tool.
      2. Standard_Age_structure.csv: This file includes the proportion of the population in each age category in the standard year of choice. This age structure was calculated using the population data from the whole United States. The structure works as the weight when calculating the weighted average for age-standardized mortality rate.
    2. And the table with your results: age_adjust_NC_HeartDisease0608.csv
  1. Add the results table to ArcMap and open it up to see what has been produced.
    1. Age_adjust_rate: the non-smoothed rate per 100,000;
    2. SpBay_AAR: the spatially smoothed rate per 100,000;
    3. SpBay_2p5 & SpBay_97p5: the lower and upper boundaries for 95% confidence interval for spatially smoothed age-standardized rate;
    4. Baye_AAR: the smoothed rate per 100,000; and
    5. Baye_2p5 & Baye_97p5: the lower and upper boundaries for 95% confidence interval for smoothed age-standardized rate.
  2. Potential Alert Messages: When the width of the confidence interval (Upper limit – Lower limit) is larger than the estimate, the estimate is unreliable. These rates should NOT be mapped.
    1. Alert:Unreliable Empirical Bayesian Estimate!!!! – The empirical Bayesian estimate is not reliable in the region.
    2. Alert:Unreliable Spatial Bayesian Estimate!!!! – The spatial Bayesian Estimate is not reliable in the region.
    3. Alert:Unreliable Estimate!!!! – Both estimates are not reliable in the region.
  3. NSpUnreli: New after version 2.15. This field indicates the reliability of non-spatial Bayesian estimates. Non-spatial Bayesian will be reliable if NspUnrel is 0.
  4. SpUnreli: New after version 2.15. This field indicates the reliability of spatial Bayesian estimates. Spatial Bayesian will be reliable if SpUnrel is 0.