Free-text Coding in NSSP–ESSENCE: Part 1

Part 1 - Wildcards

This series will be posted in the NSSP Technical Resource Center.

This is the first article in the series about how to write ESSENCE free-text queries. We thank Senior Data Analyst Zachary Stein for developing this series.

Part 2. Underscores_and Brackets [ ]
Part 3. Inclusionary Terms
Part 4. Exclusion Terms and Parentheses
Part 5. A “Starter” Fall-related Injury Query and Examples of Complex Queries
Part 6. Wrapping Things Up
Part 7. Additional Tips: ISBLANK and ISNULL
Part 8. Additional Tips: !TERM! Syntax

The search criteria for ESSENCE free-text queries are built around Boolean logical operators and regular expressions. Free-text queries are not case-sensitive and may contain “^” for wildcards; “,” for multiple entries; “ISBLANK” to look for blanks; “ISNULL” to look for nulls; [COMMA] to look for commas; and operators “and,” “or,”, “andnot,” and parentheses “()” to define order and grouping. This series will cover all these topics in-depth.

Free-text queries are what makes syndromic surveillance practice, particularly practice using NSSP–ESSENCE, adaptable to different data sources and types. By using free-text queries, analysts and epidemiologists can exercise a high level of customization. They can quickly code free-text queries and rapidly respond to outbreaks, disasters, and events that unfold. Such capabilities empower users to customize queries to fit their level of data, ensuring accurate results.

Free-text coding in ESSENCE, which is accessible to all users, follows distinct patterns. Learning to read these patterns allows users to take queries from many places and repurpose them to suit their unique needs. Syndromic surveillance depends heavily on sharing methods, and practitioners must understand the language.

Part 1. Wildcards

A wildcard is a character that can match any character or sequence of characters in a search. In ESSENCE, the wildcard is the “^” symbol, called a caret.

A caret in a free-text ESSENCE query can represent a search for anything, everything, and nothing—and all at the same time! If a user submitted a Chief Complaint free-text query of only a caret, the query would return all visits.

You will see carets in nearly every free-text ESSENCE query, regardless of the field being queried. Carets allow users to query portions or segments of a text field.

Let’s assume the following Chief Complaints (CC) and a desire to create a query for fall-related injuries:

  1. Fall
  2. Fell getting out of car
  3. Left arm injury; Fall
  4. Falling out with friends; Suicidal
  5. Feels crestfallen
  6. Patient brought in after falling on face
  7. Fall; Left wrist injury
  8. Feels congested; Allergies

You may reasonably assume the boldfaced CCs 1, 2, 3, 6, and 7 are the intended cases and 4, 5, and 8 are false positives.

Here’s a table that shows how the use of the caret can affect results:

^ Caret Query Examples
Code Description
Fall No carets in this query. Any visit returned must match the text exactly, including punctuation. Returns only CC 1.
Fall^ This query contains a caret only at the end. Returned visits must begin with the query text. Returns CCs 1, 4, and 7.
^Fall Opposite of the previous query, this segment contains a caret only at the beginning. Returned visits must end with the query text. Returns CCs 1 and 3.
^Fall^ This query brackets the text string in carets. That means anything, everything, or nothing can be in front of and behind the term. Returns CCs 1, 3, 4, 5, 6, and 7.
^Fall ^ Similarly bracketed (like previous query) but returns must have the space after the stated query text. Returns none of the above.
^ Fall^ Also like the previous two query examples but must have the space before the stated query text. Returns CCs 3 and 6.
^Fall;^ Similar to the preceding queries that contain a space, this segment requires the term be followed immediately by a semicolon. Returns only CC 7.
^ Fall^,or,Fall^ Some of these forms can be used in pairs. For example, users can run a query ^ Fall^ to remove the false positive of “Crestfallen” because it is specifying that the text must be preceded by a space. This excludes instances where the CC begins with “Fall,” so the second term serves to capture these. Returns CCs 1, 3, 4, 6, and 7.

Keep in mind that carets can be in the middle of a term or sequence of characters.

  • The query ^F^ll^ means anything, everything, and nothing can appear between the F and ll in the CC text. This query would return CCs 1, 2, 3, 4, 5, 6, 7, and 8. CC 8 was returned due to the boldfaced text segments “Feels congested; Allergies.”
  • A caret in the middle of a query term can be used to specify order. For example, the query ^Fell^Stair^ would return a CC of “I fell down two flights of stairs” and a CC of “Fell climbing stairs”; however, the query would NOT return a CC of “Climbing stairs and fell.”

How to Use the Caret

The NSSP–ESSENCE “Help” button, ESSENCE CCDD categories, and NSSP Community of Practice Knowledge Repository Syndrome Definition Library are valuable resources where you can find examples of carets being used in practice.

Whenever a user writes a free-text query in NSSP–ESSENCE, there is a “Help” button above and below the box in the Query Portal. Clicking this help button will display pointers and the following example:

(,(,^cough^,or,^fever^,),andnot,^cold^,),and,^flu^

Note the common practice of enclosing a free-text term within two carets. This ensures that despite the location or order, the terms cough, fever, or flu will appear in the Chief Complaint and be applied correctly.

For short text segments like the preceding ^flu^ example, surrounding text in carets increases the power of your query by allowing a single term to return text related to FLU and inFLUenza; however, these short text segments can also appear within other words like FLUid, reFLUx, or inFLUence. The example above solves this problem by linking ^Flu^ to other terms with an “And” operator to help narrow down results.

In the CDC Heroin Overdose v4 CCDD Category, there are the coded strings “…,or,^intoxic^,or,…” and “…^withdra^,or…”. These two strings are excellent examples of covering multiple forms of a word by shortening text and bookending with carets. ^Intox^ will return the words intoxicant, intoxicate, intoxicated, intoxication, and other forms. ^Withdraw^ covers the terms withdrawn, withdraws, withdrawing, withdrawals, and a few others. This keeps users from trying to predict all forms a key term might take within a field.

The free-text query ^Fall^,or,^ Fall^,or,^Fall ^,or,Fall^,or,^Fall can be condensed to simply ^Fall^. This is because ^Fall^ will return the segment no matter what form it takes within the field. The code ^ Fall^,or,^Fall ^,or,Fall^,or,^Fall adds nothing to the query that wasn’t already picked up by the term ^Fall^.

Carets can be used in fields other than Chief Complaints. Carets are often used in Discharge Diagnosis (DD) fields to ensure the proper visits get returned, regardless of where the DD code falls in the string of DD text.

Try it out!

Compare the query ^Fall^,or,^ Fall^,or,^Fall ^,or,Fall^,or,^Fall with the query ^Fall^ and see the results produced by each.

Run a short segment in the NSSP–ESSENCE Chief Complaint field. The system will highlight where that segment is found in the Data Details page. An example to try is the query ^tick^

We thank Senior Data Analyst Zachary Stein for volunteering to write a series of articles about free-text coding. Stein, formerly with the Kansas Department of Environment and Health, does epidemiological work to support NSSP efforts. Stein is an active participant in the NSSP–CoP. He initially wrote about free-text coding as an entry on the NSSP–CoP Syndrome Definition Committee forum. The forum generated considerable interest, inspiring this series. Stein acknowledges input provided by others who contributed to the forum post.