Glossary of Data Modernization Terms

CDC’s Data Modernization Initiative (DMI) is a comprehensive strategy for modernizing data, technology, and workforce capabilities to strengthen public health surveillance, research, and decision making. CDC is bringing together state, tribal, local, and territorial public health jurisdictions, and private and public sector partners to create modern, interoperable, and real-time public health data and surveillance systems that will protect the American public.

Data are the foundation for public health because public health depends on widespread and rapid access to data to drive decision making. CDC aims to promote

  • seamless reporting of clinical and laboratory data to public health,
  • ensure interoperability among core public health surveillance systems, and
  • support cross-cutting upgrades, such as migration to the cloud and access to new data sources.

Collectively, these activities will help ensure that the systems and services funded by CDC will scale nationwide and adapt to meet evolving needs.

The following glossary lists terms used in the Public Health Data Modernization Assessment. Terms are presented in alphabetical order.

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

Advanced analytics platform: An information system that employs predictive modeling, statistical methods, machine learning, and process automation—techniques beyond the capacities of traditional business intelligence tools—to analyze data and information assets.

Analysis of alternatives (AoA): An analytical comparison of the operational effectiveness, performance, suitability, risk, and lifecycle costs of alternatives that satisfy an established capability need.

Application modernization: The conversion, refactoring, or porting of legacy software applications to modern computer programming languages, software libraries, protocols, or hardware platforms.

Application programming interface (API): A set of tools, definitions, functions, and procedures that enables the integration of application software and services and enables data transmission or access to data and features of another application, service, or operating system (e.g., REST, SOAP).

Application rationalization: The process of streamlining an enterprise application portfolio to improve efficiency, reduce complexity, and lower total cost of ownership (TCO). Strategically cataloging and reviewing the enterprise application inventory to identify opportunities to (1) retire or consolidate redundant and minimal-value software applications; (2) reduce infrastructure costs due to decommissioning applications; (3) replace non-IT applications and processes with existing IT functionality; (4) eliminate or consolidate software licenses; (5) consolidate or virtualize hardware and software infrastructure; (6) reduce costs and improve the service-level agreement (SLA)-to-cost values via managed services; (7) eliminate, consolidate, simplify, or automate inefficient or redundant business processes; (8) reduce maintenance and support costs with modern applications; and (9) increase agility with technologies that enable rapid change.

B

Big data: High-volume, high-velocity, and/or high-variety information assets, often a combination of structured, semi-structured, and unstructured data that requires nontraditional information-processing methods to enable enhanced insight, decision making, and process automation.

Business continuity planning (BCP): The process of creating systems of prevention and recovery that outline how a business will continue operating during an unplanned disruption in service. BCP generally includes the following steps: (1) facilitate a regulatory review, (2) conduct a risk assessment, (3) perform a business impact analysis, (4) draft a strategy and plan, (5) develop an incident response plan, (6) test incident response procedures, (7) facilitate training and maintenance, and (8) draft a communication plan.

C

Capability: A jurisdiction’s ability to effectively accomplish work processes and deliver products.

Capacity: The extent to which a jurisdiction can effectively accomplish work processes and deliver products, including ensuring sufficient staffing levels and the ability to meet seasonal or varying demand levels.

Cloud computing: A federated data model that allows computer systems to send and receive data on common platforms for user sharing, comparisons, analytics, and visualization. The infrastructure for cloud computing is composed of many server computers connected by the internet.

Cloud strategy: The plan an organization follows to host its IT infrastructure in a cloud environment. By outlining the cloud’s architecture, development plans, and governance model, cloud strategies help ensure effective performance of the infrastructure, workloads, and applications hosted on the cloud.

Component-based software engineering (CBSE): A software development approach that uses loosely coupled, independent, reusable components, independently developed and deployed, and connected by standard interfaces.

Continuous assessment: See Continuous monitoring

Continuous audit: See Continuous monitoring

Continuous discovery: See Continuous monitoring

Continuous monitoring: A process of systematically monitoring information security, vulnerabilities, and threats to facilitate risk-based decision making:

  • Involves ongoing assessment and analysis of the effectiveness of all security controls.
  • Provides ongoing reporting on the security posture of information systems.
  • Supports risk management decisions to help maintain organizational risk tolerance at acceptable levels.

Continuous monitoring includes:

  • Continuous discovery: Process of discovering and maintaining a near real-time inventory of all networks and information assets, including hardware and software, and identifying and tracking confidential and critical data stored on desktops, laptops, and servers.
  • Continuous assessment: Process of automatically scanning and comparing information assets against industry and data repositories to determine vulnerabilities, prioritizing findings, and providing detailed reporting by department, platform, network, asset, and vulnerability type.
  • Continuous audit: Process of continuously evaluating client, server, and network device configurations and comparing with standards and policies, thus gaining insight into problematic controls, usage patterns, and the access permissions of sensitive data.
  • Continuous patching: Process of automatically deploying and updating software to eliminate vulnerabilities and maintain compliance, and correcting configuration settings, including network access and provision software, according to the end user’s role and policies.
  • Continuous reporting: Process of aggregating disparate scanning results from different departments, scan types, and organizations into one central repository, automatically analyzing and correlating unusual activities in compliance with regulations.

Continuous patching: See Continuous monitoring

Continuous reporting: See Continuous monitoring

D

Data lake: A centralized data storage repository capable of retaining vast amounts of traditional, structured (row and column), semi-structured, and unstructured (nontabular) data in its native format (e.g., videos, images, binary files)‍ without hierarchy or organization, and applies schema and business logic only upon retrieval. While hierarchical data warehouses store data in files or folders, data lakes use a flat architecture to store data.

Data lifecycle management (DLM): A policy-based approach or set of governing principles designed to define and manage the flow of data throughout the lifecycle of an information system—from data capture and initial storage until final disposition—to govern the creation or receipt, management, usage (e.g., publication, data sharing), archiving (e.g., retention policies and system backups), and disposition of records at end of life. Management approach generally governs data protection policies (e.g., data security, privacy, confidentiality, availability, integrity considerations).

Data Modernization Plan: A framework that lays out a strategic vision with short-, intermediate, and long-term DMI objectives. This framework:

  • Guides decisions for allocating resources.
  • Presents a shared vision for what a modernization strategy was designed to accomplish.
  • Provides a structure to track progress and success.

The Data Modernization Plan includes your jurisdiction’s (1) modernization plan for IT and informatics infrastructure used to support epidemiology and laboratory work in a jurisdiction that includes forward-looking use of scalable, sustainable, shared services and cloud infrastructure, and (2) a workforce development plan that includes how existing gaps will be addressed and how modernization efforts will be supported.

Data quality management (DQM): A set of processes and practices, methods, and technologies aimed at ensuring that the quality of the data meets or exceeds specific organizational requirements. Examples of measurable data quality attributes may include consistency, accuracy, completeness, auditability, orderliness, uniqueness, timeliness, and validity.

DevOps: A continuous-delivery development model that combines cultural philosophies, practices, and tools to increase an organization’s ability to deliver applications and services at a high velocity.

DevSecOps: A continuous-delivery development and security management model that combines cultural philosophies, practices, and tools to increase an organization’s ability to deliver applications and services at a high velocity.

E

Enterprise service bus (ESB): Middleware technology or integrated platform used to distribute work among connected components of a service-oriented architecture. ESBs are designed to provide a uniform means of moving work, offering applications the ability to connect to the bus and subscribe to messages based on simple structural and business policy rules.

H

Health information system (HIS): System used to acquire, store, deliver, and analyze clinical, epidemiological, or laboratory data in order to (1) inform public health decision making; (2) enable coordinated responses to emerging public health threats; and (3) enable clinical decision support and analytics.

I

Innovation management: A structured framework that enables the systematic promotion of new ideas, products, or services within organizations. The process includes ideation, exploration, rapid prototyping, testing, piloting, and implementation.

Integrated surveillance information system: A secure, enterprise-level surveillance platform that synthesizes laboratory, epidemiological, and other health information across domains—acute, chronic, and emerging infections—in order to maximize the public health impact of available resources.

Intermediate-term objectives: Objectives that can be addressed and completed within 3 to 5 years.

Interoperability: The ability of computer applications, platforms, systems, and networks to communicate electronically with one another by using standardized nomenclature, language, and architecture.

IT asset inventory management: A set of processes and practices that govern how an organization monitors its assets—from tangible fixed assets such as property and equipment to intangible assets such as intellectual property—in order to track physical or virtual location, maintenance requirements, depreciation, performance, and disposition.

L

Lean-Agile: A process that incorporates elements of both continuous delivery and continuous improvement, optimized across the entire value stream.

Legacy system: An outdated or antiquated computer system, programming language, or application software that is no longer compatible with modern systems; not available for purchase from vendors or distributors; or is not based on current software versions. Legacy systems may no longer be supported or maintained by their developer/vendor and may not get updated or patched automatically. A legacy system also may be associated with terminology or processes that are no longer applicable to current contexts or content.

Long-term objectives: Objectives that can be addressed and completed in 6 or more years.

M

Master data management (MDM): A technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise’s official shared master data assets.

Microservice architecture: A software development technique that is a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services. In a microservices architecture, services are fine-grained and independently deployable with lightweight protocols.

O

Open source: Refers to software whose source code is freely available to users for reference, debugging, modification, and/or extension (e.g., Linux, MySQL, R, Python, PHP). Software that is distributed with its source code, making it available for reference, use, modification, extension, and distribution with its original rights.

Open standards: A standard that is freely available for adoption, implementation, and updates (e.g., XML, SQL, and HTML.) Technical specifications and formal descriptions of software or software interfaces made freely available to the general public and developed and maintained via a collaborative and consensus-driven process. Open standards facilitate interoperability and data exchange among different products or services and are intended for widespread adoption.

P

Participatory data interpretation: A process that involves bringing a group of stakeholders together to interpret data or findings from the assessment.

Predictive analytics: A branch of advanced analytics that encompasses a variety of statistical techniques—data mining, predictive modeling, and machine learning—to analyze current and historical data in order to make predictions about future outcomes, or otherwise unknown events.

R

Record linkage strategy: Also known as data matching or entity resolution, record linkage is the process of identifying and cataloging laboratory, epidemiological, and clinical records in a data set that references the same entity across different data sources.

Reskilling: Providing education and training for the current workforce that allows them to develop skills beyond their current occupation or role.

S

Short-term objectives: Objectives that can be addressed and completed within 1 to 2 years.

Scaled Agile Framework (SAFe®): A set of organization and workflow patterns intended to guide enterprises in scaling lean and agile practices.

Server hardening: A set of disciplines and techniques that improve the security and resiliency of a site’s infrastructure.

U

Upskilling: Providing the current workforce with education and training to advance skills to improve performance in their current occupation or role.

W

Workforce: A jurisdiction’s health department staff or employees, including merit employees or full-time equivalents, part-time staff, contract staff, temporary staff, fellows and interns, and other persons contributing to the jurisdiction’s capacity.