BACKUP: Defining Data for Visualization — Do Not Use

Best Practices

About This Document

When you create a data visualization in the WCMS, you have to answer various questions about the source dataset.  This is called data definition. By defining the data, you tell the WCMS how source data are to be used in the visualization.  (You may hear the term “data mapping” to refer to this part of visualization design, but to avoid confusion with data map configuration, we use “data definition” throughout this document.)

This document is primarily for web developers who create data visualizations in the WCMS, but data managers and analysts may also benefit.

Here are a few notes to keep in mind:

  • The WCMS supports a wide variety of data visualizations, some of which require specific types of data in a specific format.  Except where noted, when we use the term “chart” in this document we are referring to the commonly used charts based on the cartesian coordinate system: bar, line, and combo.
  • If you are interested in the data definition for a specialized visualization type (such as the box-and-whiskers plot or the scatter plot), we suggest you visit the documentation specific to the visualization type. To find documentation by visualization type, visit the index page for data presentation.
  • Examples of source data are provided as screenshots of Microsoft Excel files, but the principles apply whether the source data are in JSON or CSV format.
  • This document provides general, high-level guidance. The accompanying guided exercises help WCMS web developers see the specifics of how source data are defined in the tool and serve as an introduction to chart and map development in the WCMS.  [link to exercise index page to come]
  • The example visualizations are static images, but “live” interactive visualizations are presented with the exercises.  Links to sample source data are included in the exercise instructions.

Key Data Definition Questions

To fully define the data for a visualization, you must take care to answer a few key questions on the “Import Data” and “Configure” tabs in the WCMS visualization tool:

  • Source dataset:  For each visualization, you can upload a source file to the WCMS or provide a URL.  With a URL source, you can indicate whether the tool should automatically pull from the source file with each refresh of the visualization.
  • Source orientation and number of data series: In the source dataset, what is the structure of the data to be visualized — vertical or horizontal?  And are multiple data series involved?  The answers to these questions on the “Import Data” tab lay the foundation for the visualization you’re building. Based on your answers, you may have to answer one or more additional questions before moving on to the “Configure” tab.  The section “Getting Started: The Import Data Tab” explains these concepts and provides links to exercises that provide detailed guidance on the conditional questions.
  • Data Series and Dates/Categories: You must select the source column(s) for the data series you’re visualizing. For charts, you must also indicate the source data for the date/category axis.  These concepts are illustrated in Exercise 1 and Exercise 2.  [links to come]
  • Filter Columns:  For filterable data, you must select the source column for each filter control.  This is a critical step that is easy to overlook since the visualization tool doesn’t generate an error message for this omission.  For more information, see “Setting Up Filter Controls” in this document.

In addition to the key specifications above, you and the data owner should also consider:

  • Tooltip Content:  When end users interact with a data point in a “live” data map or chart, they get a tooltip with the point value and other information.  Some information is included automatically, but you can add source data to tooltips to provide more context.
  • Data Table Contents:  As with tooltips, the tool automatically includes certain data in the supporting data table, but you can specify additional source data.  For more information on tooltips and the data table, see “Managing Content in Tooltips and the Data Table” in this document.

Getting Started: The Import Data Tab

This section provides guidance on two key data definition questions presented on the Import Data tab. Keep in mind that answering these questions is just the start of setting up a data visualization.  And because the configuration steps for a visualization are based on the answers you provide on the Import Data tab, we have provided exercises to walk you through the complete configuration of the example visualizations.  If you’re new to the WCMS visualization tool or just want to have a better grasp of how the took works, we strongly suggest that you work the exercises in this section.

Determining Source Data Orientation

Screenshot of map source data with values structured vertically

Map data in typical vertical format

When you specify a source URL or file for a visualization, the first question is about the orientation of the source data:  vertical or horizontal.  Subsequent questions on the Import Data tab are based on the answer to this question.

With maps, the orientation is typically vertical, meaning that the data values are in a single column as illustrated here. To answer the vertical/horizontal question for bar, line, and combo charts, you have to think about which categories are to be presented on the Date/Category Axis.  Let’s look at some illustrations of the orientation concept for charts.

Source Data Example 1 for Charts

In the first chart below, Age Groups are presented on the Date/Category Axis, and since these categories are formatted horizontally in the source data on the left, the source dataset is defined as “horizontal.”  The second chart uses the exact same source data but presents the Sex categories along the Date/Category axis, so the source dataset is defined as “vertical.”

Source data in horizontal format and resulting chart

Exercise 1: Working with Vertical vs. Horizontal Data Formats.  Follow these exercise instructions to build the two charts above using the same data shown in the illustration.  This exercise provides a great introduction to the Import Data tab and the configuration of the Date/Category Axis.

Excluding categories:  Note how the “All” value is not included in the second chart above even though the Sex column in the source data has “All” values.  As Exercise 1 demonstrates, the WCMS visualization tool makes it easy to exclude data from the Date/Category Axis as well as the Legend.   [link to exercise to come]

Source Data Example 2 for Charts

Now let’s look at the same source data but in a different structure.  With this example, both the Sex categorization and Age Group categorization are structured vertically.  This means that we would select “Vertical” to build both versions of the chart.  (This structure is sometimes referred to as long format.  Source data 1 is a hybrid structure:  Sex data are in long format while Age Group data are in wide format.)

Source data defined as "vertical" and resulting bar chart

Exercise 2:  With this exercise you can see how data definition works when chart data are structured like the data in source example 2.  You will find that you have more questions to answer on the Import Data tab.  [link to come]

Determining the Number of Data Series

Are there multiple series represented in your data?  That’s the second key question on the Import Data tab.

The WCMS visualization tool treats the data sources for maps as single-series datasets regardless of the dataset format.  This is primarily due to the fact that data maps are inherently single-series visualizations. The bottom line is that, for maps, you can answer No to this question.

The datasets for line, bar, and combo charts can be single- or multi-series, and you must, as the question implies, indicate how many series are supported by the source data — not the number to be visualized in the chart. So for these charts, the answer can be Yes or No.

The first dataset below, shown in its entirety, clearly has only one data series for age groups. The second dataset has multiple series for both age groups and sex. With the second dataset, whether you intend to visualize only one data series or multiple, the answer to the question about multiple series would be Yes.

Screenshots of single-series and multi-series datasets for comparison

Setting Up Filter Controls

With filter controls, you can provide end users with different views on data in a single visualization. This feature is available for maps and all charts except the single-data-point charts such as data bite, gauge, and waffle chart.  To take advantage of this feature, you must have a dataset formatted for filtering. This is due to the way that filter controls are typically set up in the WCMS:  Each filter control is associated with a single column of source data.

Example of Formatting as a Filter Limitation

Consider the map data below.  The data for males and females are in separate columns. If a goal is to allow end users to filter the data by sex, the data must be reformatted.  (However, the same data file could be used to generate two data maps, one for males and one for females.)

Source data with values for males and females in separate columns

The Solution for Filtering

To present the data in a single map with a filter for sex, the solution is to reformat the data for males and females in long format:

Source data and map with Sex filter

Handling Duplicate Data

It’s important to understand that, with the source data above, a filter control for the Sex column is not just desirable — it’s essentially required. In fact, filter controls are required for both Year and Sex to resolve the issue of duplicate data.  Without these filter controls, the WCMS would still generate a map, but with great risk of unreliable data. This is just one reason that WCMS developers should become familiar with both the source data and data format they are working with so that they can resolve potential issues as soon as possible.

Managing filter settings: Note that the WCMS automatically pulls the options for the filter control, so the data manager could easily add an “All” value as a selection in the future simply by updating the source file.  When a filter value is added, the WCMS developer may need to edit the filter control to ensure that the correct default selection is set. This is accomplished by editing the order so that the default is first in the list. The developer also has options for controlling the type of control — tab, pill, or drop-down selector.

Exercise #:  Setting Up Filtering Controls.  This exercise demonstrates how to set up filters using the an expanded version of the dataset shown above.

Managing Content in Tooltips and the Data Table

Most visualization types in the WCMS support additional columns.  By “additional” we mean columns of data that are not visualized but are instead presented as supplemental information in pop-ups, the supporting data table, or both. With long-formatted data, it’s easy to take advantage of this feature.

In the illustration below, the source file includes two pieces of information for each state:  Rate and Funding Status.  The chart designer has decided to visualize the numeric Rate while including the Funding Status as supplemental information in the pop-ups that display when a user interacts with the map. The Funding Status can also be included in the supporting data table.

Funding status as additional column in map pop-up

Exercise #: Adding Content to Tooltips and Data Table:  In this exercise, you will enhance the map you built in exercise [#] by adding content to tooltips and the data table.

About categorical maps:  The map in the illustration above is a numeric or quantitative map in that the color-coding is based on numeric data (Rate). With the same source data, you could also create a categorical map based on the Funding Status.  Categorical maps can be sequential or qualitative.  See example categorical maps.

Exercise #: Categorical Maps.  Data definition for a categorical map is a bit different from the definition of a numeric / quantitative map. This exercise walks you through the key steps.

Working with Confidence Intervals

With the single-series bar, line, and combo charts you can include one set of confidence intervals (CIs).  (With the forecast chart, you can include multiple confidence intervals over time.)

In the source dataset, each CI group must be formatted as two columns:  one for the lower-bound values and one for the upper-bound values.  (Note that at this time, only single-series charts can display CI values.)

Screenshot of source dataset with CIs and resulting bar chart

Exercise #: Working with Confidence Intervals.

Working with Multiple Metric Series

At times multiple metrics may be available in the same source dataset, for example:

  • Total Cases and Cases per 100K by Year
  • Cases per 100K and Program Costs ($) by State

See the document Data Visualization: Presenting Multiple Metrics for a discussion of four scenarios involving multiple metric series from a single source dataset.

Q & A

Q:  Our primary target audience expects data in a specific format, which involves wide formatting of key values.  How do we meet their needs while generating our data in the most flexible format for use in the WCMS?

A: Feel free to upload a copy of the data set in the users’ preferred format and provide a link to the file as an alternative to the WCMS-generated download file or as a replacement.  You can even opt to hide the data table that is generated by WCMS visualization tool and provide your own supporting data table.  (Make certain you do provide a table.  A supporting data table is essential for meeting 508 accessibility standards.)

Q:  Our source datasets often have text in numeric data columns. How does the WCMS visualization tool handle such text?

A: [talk about maps vs. charts]