Free-text Coding in NSSP–ESSENCE: Part 5

Part 5. A "Starter:" Fall-related Injury Query and Examples of Complex Queries

This is the fifth article in the series about how to write ESSENCE free-text queries. We thank Senior Data Analyst Zachary Stein for developing this series.

Part 1. Wildcards
Part 2. Underscores_and Brackets [ ]
Part 3. Inclusionary Terms
Part 4. Exclusion Terms and Parentheses
Part 6. Wrapping Things Up

Introduction

The search criteria for ESSENCE free-text queries are built around Boolean logical operators and regular expressions. Free-text queries are not case-sensitive and may contain “^” for wildcards; “,” for multiple entries; “ISBLANK” to look for blanks; “ISNULL” to look for nulls; “[COMMA]” to look for commas; and operators “and,” “or,”, “andnot,” and parentheses “()” to define order and grouping. This series will cover all these topics in-depth.

Free-text queries are what makes syndromic surveillance practice, particularly practice using NSSP–ESSENCE, adaptable to different data sources and types. By using free-text queries, analysts and epidemiologists can exercise a high level of customization. They can quickly code free-text queries and rapidly respond to outbreaks, disasters, and events that unfold. Such capabilities empower users to customize queries to fit their level of data, ensuring accurate results.

Free-text coding in ESSENCE, which is accessible to all users, follows distinct patterns. Learning to read these patterns allows users to take queries from many places and repurpose them to suit their unique needs. Syndromic surveillance depends heavily on sharing methods, and practitioners must understand the language.

Part 5. A “Starter” Fall-related Injury Query and Examples of Complex Queries

Parts 1 through 4 of this series cover basic and complex query notation, including the use of carets; underscores and brackets; “AND”, “OR”, and “ANDNOT” statements; and the parentheses that can be used to group these statements together.

A “Starter” Fall-related Injury Query

So far, this series has referenced a sample set of hypothetical Chief Complaints (CCs) and a desire to create a query for fall-related injuries. Now that we have all the query language at our fingertips, here’s how I would develop a query that includes the intended visits and excludes the false positive CCs.

Let’s assume the following Chief Complaints (used in Parts 1–4) and a desire to create a query for fall-related injuries.

  1. Fall
  2. Fell getting out of car
  3. Left arm injury; Fall
  4. Falling out with friends; Suicidal
  5. Feels crestfallen
  6. Patient brought in after falling on face
  7. Fall; Left wrist injury
  8. Feels congested; Allergies

You may reasonably assume the boldfaced CCs 1, 2, 3, 6, and 7 are the intended cases and 4, 5, and 8 are false positives.

Here’s a table that describes what we’ve learned about fall-related queries:

 

 

Fall-related Query Examples
Fall-related Query Examples
Code Description
  1. First, we see that our Chief Complaints (CCs) contain two indicators of falls: the terms “Fall” and “Fell.”
^Fall^,OR,^Fell^
This query picks up CCs 1, 2, 3, 4, 5, 6, and 7. This captures all intended visits but also captures the false positive terms for “Falling out with friends” and “Feeling crestfallen.”
  1. To exclude unintended visits, let’s start with the term “Crestfall.” There are multiple ways to do this, but here are two common options:
^Fall^,ANDNOT,^Crestfallen^,OR,^Fell^
We can negate the false positive term directly. This returns CCs 1, 2, 3, 4, 6, and 7.
^ Fall^,OR,Fall^,or,Fell^
Or, we can be specific with our inclusion term. This code specifies that the term starts with the word (or CC) and returns CCs 1, 2, 3, 4, 6, and 7.
  1. Now, the only false positive that remains is CC 4, “Falling out with friends.” Adding a negation for “Falling” would probably have unintended consequences in real data and negate the true positive of CC 6 in our examples. There isn’t a great solution to exclude CC 4. Here are some examples and the pros and cons of each:
(,^Fall^,ANDNOT,^Crestfallen^,),AND,
(,^Injur^,OR,^Face^,or,^Wrist^,or,^Ankle^
,or,^Broke^,or,^Hurt^,or,^Lacer^,or,^Hit^
,or,^Head^,),OR,^Fell^
We can require that “Fall” coincide with an injury term. The code shown returns CCs 3, 6, and 7. This excludes CC 4 but also excludes the likely true positive CCs 1 and 2. Also, this query depends heavily on the user’s ability to cover all injury descriptions that can result from a fall, which could get prohibitively long.
^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling
 out with^,),OR,^Fell^
The query can negate the text string specific to CC 4. The more specific your negation term becomes, the more likely it won’t apply across broad real-life data or negate similar false positives. Coding very specific negations can be used to negate especially troublesome false positives. The code to the left has a very specific negation to exclude troublesome CC 4 and returns CCs 1, 2, 3, 6 and 7.
^Fall^,ANDNOT,
(,^Crestfallen^,OR,^Suicid^,),OR,^Fell^
The query can negate based on how CC 4 relates to suicide ideation. However, that raises the question of how many times fall-related injuries occur alongside mentions of suicide ideation. This query returns CCs 1, 2, 3, 6, and 7.

 

So, that makes our final query on our hypothetical CCs either

^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^

Or . . .

^Fall^,ANDNOT,(,^Crestfallen^,OR,^Suicid^,),OR,^Fell^

Both queries are concise and accurate, but will they work with real-life data? The Chief Complaint Query Validation (CCQV) dataset in NSSP–ESSENCE is the logical (and best) place to start when creating a query that will be used nationwide. But for most situations, a user’s local or regional data will suffice. For the next steps, we will use just the query ^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^.

Fall-related injuries are common in emergency department data. The existing ESSENCE fall subsyndrome pulls these visits reliably, but we can also run our new syndrome and see how it performs on real-life data. Running the above query in CCQV data where HasBeenE=Yes (roughly limiting it to emergency care) returns just under 80,000 visits per week. Moving forward to see how our query works with a broader dataset, we will focus on just MMWR week 22, 2019. Contrary to this example that uses a single week’s data, anyone validating a query should do so across multiple weeks to ensure that coverage includes a broad range of CC variations.

The official return of our query for MMWR week 22 is 79,604 visits. We can let ESSENCE take a random sample of these visits in the Data Details page to review. After reviewing these visits, our query seems to be performing quite well. False positives immediately noted are: Chief Complaints about assorted medical devices that “fall out” and items that fall or fell onto the patient.

Because we are looking for false positives, we can further refine the data we have to review by removing likely true positive visits. One method is to remove those that have relevant CCs and also have Discharge Diagnoses (DDs) related to our intended records. We can do this by putting the original query in parentheses and then linking to relevant DD codes with an “ANDNOT” statement. Although the methodology for picking DD codes is beyond the scope of this article, a quick ICD10 negation of visits coded with a W00-W19 ICD10 code covering slipping, tripping, stumbling, and falls will allow us to remove many true positives.

Our code to remove DD-positive visits, which we’ll run in the CCDD field, is shown below. Note the use of brackets to cover this wide range of “W” codes with a single free-text term:

(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^,),ANDNOT,(,^[;/ ]W[01][0-9]^,)

This query returns 53,676 visits for MMWR week 22. A review of the returns shows, once again, only a few false positives.

Code can be written to further isolate potential false positives, making it easier for users to quickly review and look for trends in terms to negate in the final query. For example, if a user has some knowledge of medical devices that can potentially “fall out,” these devices can be listed in a query and thereby isolated. To do this, group the medical device terms within parentheses and then use an “AND” statement to link the parenthetical statement to the original query before the “ANDNOT” statement of diagnosis codes.

(,(,^Fall ^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^,),AND,(,^Tube^,OR,^Cath^,OR,^Monitor^,OR,^Stitch^,),),ANDNOT,(,^[;/ ]W[01][0-9]^,)

Note: This query returns 200 visits in MMWR week 22. Of these, at least 20 are visits where the fall event dislodged the medical device and caused it to “Fall out.” “ANDNOT” statements looking for false positives can have surprising and unintended consequences.

Try it out!

Try to negate cases where an item falls/fell onto the patient. With each negation, think about its consequences. For example, dressers can fall on people, but people can also fall off dressers.

Will the ANDNOT string (,^Dresser f[ea]ll^,) work for this? See what you’re negating with the code

(,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^,),AND,(,^Dresser f[ea]ll^,),),ANDNOT,(,^[;/

Complex Query Examples

In next month’s free-text coding article, we will present a rough final query for fall-related injuries and then break it apart. For now, here are some complex queries and different free-text language techniques they use well.

Notice in the highlighted portion of the new CDC Medication Refill v1 CCDD Category how an “ANDNOT” statement (red) is nested within the first group of inclusion terms (green), which, in turn, are linked to another group of inclusion terms (gold). By nesting the “ANDNOT” statement within the inclusion terms, the coder can be specific with the negations. Here, only the specific false positives pulled by the ^o2^ term are negated. This way, you only apply negations to terms where needed.

(,(,^tank^,or,^oxygen^,or,^o2^,ANDNOT,(,^O2[0-9]^,),or,^medication^,or,^script^,or,^pill^,or,^prescription^,or,^medicine^,),AND,(,^refil^,or,^supplemental^,or,^empty^,or,^need^,or,^out of^,or,^request^,),),or,(,^[;/ ]z76.0^,or,^[;/ ]z760^,or,^[;/ ]V68.1^or,^[;/ ]V681^,or,^[;/ ]182918009^,)

Shown below is the “Rabies and Animal Bites – Florida Department of Health” query, which is posted in the Syndromic Definition Library (SDL) in the Knowledge Repository (https://surveillancerepository.org/rabies-and-animal-bites-florida-department-healthexternal icon):

(,^rabie^,or,^rabbie^,or,^bitten^,or,^bite^,or,bit ^,or,^ bit,or,^ bit ^,),andnot,(,^bug^,or,^ant^,or,^child^,or,^insect^,or,^flea^,or,^person^,or,^human^,or,^snake^,or,^caterpil^,or,^pygmy rattler^,or,^something^,or,^spider^,or,^wasp^,or,^mosquito^,or,^tick^,or,^fish^,or,^recluse^,)

A notable feature of this rabies query is the concept that instead of being more specific with inclusion terms, coders chose to exclude visits where the biter was mentioned AND is not a likely carrier of rabies. This works great since a query for ^Bite^ alone will pull in insects, reptiles, amphibians, and children and a query for ^Bite^,AND,(,^Cat^,or,^Dog^,or,^etc…….^,) will exclude unattributed bites and fail to bring them to an epidemiologist or other decision-maker’s attention.

The CSTE Cold-Related Illness query posted to the SDL serves as another excellent example of key query methods (https://surveillancerepository.org/cold-related-illness-cste-january-2019external icon).

Notice how the ICD9 code starting with 991 (highlighted in green) uses the bracketed list [;/ ] at the start to ensure the number segment starts the code, followed by an underscore to specify anything can fall in that place, but that place must be filled, and then ends with another bracketed list [;/ ] to specify the end of the code. The bracket at the end ensures that numbers longer than 4 digits won’t be pulled by this query.

This query also does a nice job negating historic frostbite and cold exposure injuries (highlighted in red) since frostbite patients are likely candidates for repeat ED visits for bandage changes, infection, swelling, and other issues. Notice in particular the use of  …^[TX]__.__XD^,or,^[TX]____XD^… to negate any injury or external cause code that has a “D” at the end indicating subsequent encounters.

(,^cold exp^,or,^cold-exp^,or,^env exp^,or,^environmental exp^,or,^exp to env^,or,^exposure to env^,or,^exp to cold^,or,^exposure to cold^,or,^cold exposure^,or,^coldexposue^,or,^out in the cold^,or,^f[ro][ro]stbit^,or,^f[ro][ro]st bit^,or,^f[ro][ro]stnip^,or,^f[ro][ro]st nip^,or,^f[ro][ro]st ni^,or,^hypothe^,),or,(,^froze^,and,(,^hand^,or,^finger^,or,^thumb^,),),ANDNOT,(,^Meat^,or,^burger^,or,^pork^,or,^wart^,or,^growth^,or,^food^,or,^knife^,or,^shoulder^,or,^turkey^,or,^knee^,),or,(,^froze^,and,(,^Feet^,or,^foot^,or,^toe^,),ANDNOT,(,^Meat^,or,^burger^,or,^pork^,or,^wart^,or,^growth^,or,^food^,or,^knife^,or,^shoulder^,or,^turkey^,or,^knee^,),),or,(,^[;/ ]x31^,or,^[;/ ]t68^,or,^[;/ ]t69^,or,^[;/ ]t33^,or,^[;/ ]t34^,or,^[;/ ]991._[;/ ]^,or,^[;/ ]991_[;/ ]^,or,^[;/ ]E901.[089]^,or,^[;/ ]E988.3^,or,^[;/ ]E901[089]^,or,^[;/ ]E9883^,),andnot,(,^recheck^,or^re-check^,or,^history of frostbite^,or,^historyfrostbite^,or,^dressingchang^,or,^chest^,or,^congestion^,or,^cough^,or, ^infection^,or,^[;/ ]w93^,or,^[;/]r68.0^,or,^[;/]r680^,or,^[;/ ]e901.1^,or,^[;/ ]780.65^,or,^[;/ ]e9011^,or,^[;/ ]78065[;/ ]^,or,^[TX]__.__XD^,or,^[TX]____XD^,)

 

We thank Senior Data Analyst Zachary Stein for volunteering to write a series of articles about free-text coding. Stein, formerly with the Kansas Department of Environment and Health, does epidemiologic work to support NSSP efforts. Stein is an active participant in the NSSP CoP. He initially wrote about free-text coding as an entry on the NSSP CoP Syndrome Definition Committee forum. The forum generated considerable interest, inspiring this series. Stein acknowledges input provided by others who contributed to the forum post.

Page last reviewed: June 16, 2020