Introduction to Analyzing FHIR Data in a Tabular Format
Researchers who want to analyze FHIR-formatted data will typically need to convert data from FHIR format into a table-based format that can be used directly in analysis environments like Python, R, SAS, or Stata. Data analysis in these environments typically requires tidy data:
Tidy datasets are easy to manipulate, model and visualise, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
FHIR’s data format does not meet this definition of “tidy.” While instances of FHIR resources do typically map onto observational units, the data contained within instances of FHIR resources are not “flat” – instead, data are stored in nested data structures. In some cases, like Observation.component
, there may even be multiple “observational units” inside of a single FHIR resource instance.
Fortunately, FHIR’s computable JSON or XML data format makes it possible to use software to convert FHIR data into a tabular format. This is can be done with custom code, or via existing purpose-build libraries.
There are purpose-built libraries for Python and R that facilitate FHIR-to-tabular conversion:
- FHIR-PYrate for Python (MIT open source license)
- fhircrackr for R (GPL-3 license open source license)
A technical introduction to using these libraries with synthetic FHIR-format data is provided in Using Python and Using R. The approach described in these modules can typically be performed by an informaticist, analyst, or statistician who is familiar with one of these environments and has reviewed the content on this website.
Using other analysis environments
If you do not want to use Python or R to analyze your data, it is possible to use a different analysis tool like SAS or Stata.
Modern versions of data analysis tools like SAS and Stata have the ability to directly read generic JSON or XML data. For example, SAS can read JSON via the JSON engine and XML via this method. Stata can read JSON via a third-party library or using its integration with Python. Other approaches like using an ODBC driver may also work.
However, it may be simpler to use Python or R to convert FHIR-formatted data to tabular, and then import this into your analysis tool of choice. This can be done via Pandas in Python or the foreign package in R.
Cloud tools
If you are able to use public cloud providers like Amazon AWS, Google Cloud, or Microsoft Azure, you may be able to use tools that are part of these platforms to ingest FHIR-formatted data and analyze it. Use of these tools is beyond the scope of this module, but some additional information is linked below.