Common Data Models

Learning objectives

Understand the purpose of a CDM for research.
Describe the differences between CDMs and FHIR.
Describe how FHIR can be used alongside CDMs.
Identify some notable research CDMs used in the US.

Relevant roles:

Investigator
Research Leaders
Informaticist
Software Engineer
Clinician Scientist/Trainee

A Common Data Model (CDM) is a set of standardized data schemas that are used across organizations. CDMs can facilitate research by making it possible to write a single data query that will run across organizations using the same CDM. Without a CDM, a researcher would need to create a different query for each organization.

Organizations that use CDMs can form a health data network to facilitate data access by network members. For more information on how health data networks are structured, see Weeks and Pardee (n.d.).

1 CDMs and FHIR

CDMs and FHIR are both important parts of the health research data ecosystem, but differ in key ways:

Purpose and scope

CDMs: Provide a consistent structure for storing and querying health data to faciliate research.
FHIR: Enable interoperability between health IT systems (which can included research-related systems).

Structure

CDMs: Typically, CDMs consist of a standardized set of relational database tables, with columns for each data element. CDMs also typically define standardized terminology for representing concepts.
FHIR: FHIR defines resources for each type of data, which can be extended to add additional data elements as needed for a given use case. Resources have a nested structure (like a multi-level outline) rather than a tabular structure (like a spreadsheet). The base FHIR specification does have strict terminology constraints, though these can be added for a specific use case in an Implementation Guide.

Implementation

CDMs: Typically implemented as a database within a research data warehouse. When CDMs are used as part of a research data network, additional mechanisms may be implemented for sharing data across the network (Weeks and Pardee, n.d.).
FHIR: Typically implemented as an API within a health IT system like an EHR, though research-specific databases may also implement FHIR (e.g., Kids First).

Because of these differences, FHIR does not obviate existing CDMs and health data networks. However, FHIR can be used in conjunction with CDMs. For example:

Populating a CDM: FHIR can be used as part of the process for populating a CDM. Typically, data are extracted from EHRs and other clinical or billing systems, transformed into the CDM’s format, and then loaded into the CDM database. (Note that this type of process is generally referred to as ETL for Extract, Transform, Load.) FHIR can be used as the input into an ETL process, which may allow for ETL logic that is more portable across institutions and EHR installations.
Mapping between CDMs: The Common Data Models Harmonization IG uses FHIR to map and translate between PCORNet, i2b2, OMOP, and Sentinel (all described below).
Portable phenotypes: FHIR and Clinical Quality Language (CQL) were used by Brandt et al. (2022) to create phenotypes for cohort definitions. These were translated to run against the OMOP CDM. Theoretically, institutions that do not have OMOP research databases could use FHIR and CQL directly to identify cases and non-cases.

2 Notable CDMs

If you are not familiar with CMS, below are some notable CDMs used in the US:

2.1 OMOP

OMOP was originally created in 2007 by the FDA and other partners to study the effects of medical products.(Weeks and Pardee, n.d.) It is now managed by the OHDSI (Observational Health Data Science and Informatics).

OMOP is an:

open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.

Diagram showing the structure of the OMOP CDM. — caption

OMOP uses OHDSI standardized vocabularies. The OHDSI standardized vocabularies establish a single standard concept for each clinical entity. When data is converted into OMOP, the source concept is mapped to the corresponding standard concept. This allows researchers to interpret and use clinical entities from different organizations.(OHDSI, n.d.)

2.2 Sentinel

Sentinel is a CDM designed to monitor whether FDA-regulated medical products cause unexpected adverse reactions.

Although FDA-regulated products undergo clinical testing before approval, the testing may miss adverse reactions. This is because the test population may not fully represnt the population that uses the product. To detect and assess adverse events after approval, the FDA created Sentinel in 2007. See here for more information about the Sentinel data network.

Diagram describing the contents of the Sentinel CDM. — Click to enlarge

2.3 PCORnet

The Patient-Centered Outcomes Research Institute (PCORI) developed PCORnet to make clinical research more streamlined, representative, and efficient. PCORnet data is largely drawn from EHRs, as well as some patient-reported and payor data. The PCORnet includes data from 66 million patients that researchers can use for observational studies. PCORnet launched in 2014 (Fleurence et al. 2014).

Diagram showing the structure of the PCORnet CDM v6.1. — Click to enlarge

2.4 i2b2

Developed in 2004 by Partners HealthCare and Harvard Medical School, i2b2 is an open-source research data application framework that “provides clinical and translational investigators with the tools necessary to integrate medical record and clinical research in the genomics age.” (n.d.)

i2b2 is structured differently from the other CDMs discussed here: rather than having separate data tables for each type of data (e.g., procedures and diagnoses are stored in different database tables, each with a different set of columns), i2b2 uses the same set of tables for all data types. i2b2 refers to this as the “star schema” (see diagram below). i2b2 describes this as follows:

Instead of separate tables for diagnoses, medications, and other data types, all patient observations are stored in a single “fact” table. A separate ontology describes the different codes that are placed in this fact table. As a result, institutions can use their own local codes, without having to map to common code sets. Furthermore, institutions can easily add new types of data to i2b2 and tranSMART just by extending the ontology. No changes to the database or software are needed. This enables software developers to build query, analysis, and visualization tools that are generalizable to different types of data and future-proof since the i2b2 CDM can remain stable over time.

Diagram showing the i2b2 "star schema". — Diagram showing the i2b2 “star schema”, from .

References

n.d. https://community.i2b2.org/wiki/display/BUN/1.+Introduction.

Brandt, Pascal S., Jennifer A. Pacheco, Prakash Adekkanattu, Evan T. Sholle, Sajjad Abedian, Daniel J. Stone, David M. Knaack, et al. 2022. “Design and validation of a FHIR-based EHR-driven phenotyping toolbox.” Journal of the American Medical Informatics Association: JAMIA 29 (9): 1449–60. https://doi.org/10.1093/jamia/ocac063.

Fleurence, Rachael L, Lesley H Curtis, Robert M Califf, Richard Platt, Joe V Selby, and Jeffrey S Brown. 2014. “Launching PCORnet, a National Patient-Centered Clinical Research Network.” Journal of the American Medical Informatics Association : JAMIA 21 (4): 578–82. https://doi.org/10.1136/amiajnl-2014-002747.

OHDSI. n.d. GitHub. https://github.com/OHDSI/Vocabulary-v5.0/wiki/General-Structure-and-Use.

Weeks, John, and Roy Pardee. n.d. “Learning to Share Health Care Data: A Brief Timeline of Influential Common Data Models and Distributed Health Data Networks in u.s. Health Care Research.” eGEMs 7 (1): 4. https://doi.org/10.5334/egems.279.