Analyzing FHIR Data in a Tabular Format With Python

Learning objectives

Understand the high-level approaches for converting FHIR-formatted data into tabular for analysis in Python.
Learn how the FHIR-PYrate library facilitates requesting data from a FHIR server, and creating tidy tabular data tables.

Relevant roles:

Informaticist

Data analysis approaches in Python often use Pandas DataFrames to store tabular data. There are two primary approaches to loading FHIR-formatted data into Pandas DataFrames:

Writing Python code to manually convert FHIR instances in JSON format into DataFrames.

This does not require any special skills beyond data manipulation in Python, but in practice can be laborious (especially with large number of data elements) and prone to bugs.
Using a purpose-built library like FHIR-PYrate to automatically convert FHIR instances into DataFrames.

It is recommended to try this approach first, and only fall back to (1) if needed.

To use FHIR-PYrate, you will need a Python 3 runtime with FHIR-PYrate and Pandas installed.

1 FHIR testing server

The examples in this module use a FHIR testing server populated with Synthea data in FHIR R4 format via public HAPI Test Server operated by HAPI FHIR.

The endpoint for this testing server is:

https://hapi.fhir.org/baseR4

However, any FHIR server loaded with testing data can be used. See Standing up a FHIR Testing Server for instructions to set up your own test server.

The code blocks in the following section show sample output immediately after. This is similar to the code cells and results in a Jupyter notebook.

2 Retrieving FHIR data

Once your environment is set up, you can run the following Python code to retrieve instances of the Patient resource from a test server:

# Load dependencies
from fhir_pyrate import Pirate
import pandas as pd

# Instantiate a Pirate object using the FHIR-PYrate library to query a test FHIR server
search = Pirate(
    auth=None,
    base_url="https://hapi.fhir.org/baseR4",
    print_request_url=True,
)

# Use the whimsically named `steal_bundles()` method to instantiate a search interaction
#
# For more information, see https://github.com/UMEssen/FHIR-PYrate/#pirate
bundles = search.steal_bundles(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
)

# Execute the search and convert to a Pandas DataFrame
df = search.bundles_to_dataframe(bundles)

df.head(5)

https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|

Query (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query (Patient): 100%|██████████| 1/1 [00:00<00:00, 1841.22it/s]

	resourceType	id	meta_versionId	meta_lastUpdated	meta_source	text_status	text_div	extension_0_url	extension_0_extension_0_url	extension_0_extension_0_valueCoding_system	...	maritalStatus_text	multipleBirthBoolean	photo_0_contentType	photo_0_data	communication_0_language_coding_0_system	communication_0_language_coding_0_code	communication_0_language_coding_0_display	communication_0_language_text	address_0_postalCode	communication_0_preferred
0	Patient	258974	5	2025-07-21T11:48:24.613+00:00	#n0HMq4FLyY9oOAio	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	S	False	image/jpeg	/9j/4AAQSkZJRgABAQEASABIAAD/4QBWRXhpZgAATU0AKg...	urn:ietf:bcp:47	en-US	English	English	NaN	NaN
1	Patient	298666	5	2023-09-28T22:01:11.961+00:00	#CLEMnh2cjZt823TI	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	S	False	NaN	NaN	urn:ietf:bcp:47	en-US	English	English	78945	True

2 rows × 92 columns

It is easier to see the contents of this DataFrame by printing out its first row vertically:

# Print the first row of the DataFrame vertically for easier reading.
pd.set_option("display.max_rows", 100)  # Show all rows
df.head(1).T

	0
resourceType	Patient
id	258974
meta_versionId	5
meta_lastUpdated	2025-07-21T11:48:24.613+00:00
meta_source	#n0HMq4FLyY9oOAio
text_status	generated
text_div	<div xmlns="http://www.w3.org/1999/xhtml">Gene...
extension_0_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_0_extension_0_url	ombCategory
extension_0_extension_0_valueCoding_system	urn:oid:2.16.840.1.113883.6.238
extension_0_extension_0_valueCoding_code	2106-3
extension_0_extension_0_valueCoding_display	White
extension_0_extension_1_url	text
extension_0_extension_1_valueString	White
extension_1_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_1_extension_0_url	ombCategory
extension_1_extension_0_valueCoding_system	urn:oid:2.16.840.1.113883.6.238
extension_1_extension_0_valueCoding_code	2186-5
extension_1_extension_0_valueCoding_display	Not Hispanic or Latino
extension_1_extension_1_url	text
extension_1_extension_1_valueString	Not Hispanic or Latino
extension_2_url	http://hl7.org/fhir/StructureDefinition/patien...
extension_2_valueString	Ying817 Eichmann909
extension_3_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_3_valueCode	F
extension_4_url	http://hl7.org/fhir/StructureDefinition/patien...
extension_4_valueAddress_city	Worcester
extension_4_valueAddress_state	Massachusetts
extension_4_valueAddress_country	US
extension_5_url	http://synthetichealth.github.io/synthea/disab...
extension_5_valueDecimal	7.222524
extension_6_url	http://synthetichealth.github.io/synthea/quali...
extension_6_valueDecimal	39.777476
identifier_0_system	https://github.com/synthetichealth/synthea
identifier_0_value	bf23e283-4791-46e1-9d79-9e0ad9edd436
identifier_1_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_1_type_coding_0_code	MR
identifier_1_type_coding_0_display	Medical Record Number
identifier_1_type_text	Medical Record Number
identifier_1_system	http://hospital.smarthealthit.org
identifier_1_value	bf23e283-4791-46e1-9d79-9e0ad9edd436
identifier_2_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_2_type_coding_0_code	SS
identifier_2_type_coding_0_display	Social Security Number
identifier_2_type_text	Social Security Number
identifier_2_system	http://hl7.org/fhir/sid/us-ssn
identifier_2_value	999-21-6325
identifier_3_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_3_type_coding_0_code	DL
identifier_3_type_coding_0_display	Driver's License
identifier_3_type_text	Driver's License
identifier_3_system	urn:oid:2.16.840.1.113883.4.3.25
identifier_3_value	S99948444
identifier_4_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_4_type_coding_0_code	PPN
identifier_4_type_coding_0_display	Passport Number
identifier_4_type_text	Passport Number
identifier_4_system	http://standardhealthrecord.org/fhir/Structure...
identifier_4_value	X30821805X
active	True
name_0_use	official
name_0_family	Keebler
name_0_given_0	Kina
name_0_prefix_0	Ms.
telecom_0_system	phone
telecom_0_value	555-939-7778
telecom_0_use	home
gender	female
birthDate	1971-01-13
deceasedBoolean	False
address_0_extension_0_url	http://hl7.org/fhir/StructureDefinition/geoloc...
address_0_extension_0_extension_0_url	latitude
address_0_extension_0_extension_0_valueDecimal	42.5917
address_0_extension_0_extension_1_url	longitude
address_0_extension_0_extension_1_valueDecimal	-70.641346
address_0_line_0	1038 Harvey Green
address_0_city	Gloucester
address_0_state	Massachusetts
address_0_country	US
maritalStatus_coding_0_system	http://terminology.hl7.org/CodeSystem/v3-Marit...
maritalStatus_coding_0_code	S
maritalStatus_coding_0_display	Single
maritalStatus_text	S
multipleBirthBoolean	False
photo_0_contentType	image/jpeg
photo_0_data	/9j/4AAQSkZJRgABAQEASABIAAD/4QBWRXhpZgAATU0AKg...
communication_0_language_coding_0_system	urn:ietf:bcp:47
communication_0_language_coding_0_code	en-US
communication_0_language_coding_0_display	English
communication_0_language_text	English
address_0_postalCode	NaN
communication_0_preferred	NaN

If you look at the output above, you can see FHIR-PYrate collapsed the hierarchical FHIR data structure into DataFrame columns. FHIR-PYrate does this by taking an element from the FHIR-formatted data like Patient.identifier[0].value and converting to an underscore-delimited column name like identifier_0_value. (Note that Patient.identifier has multiple values in the FHIR data, so there are multiple identifier_N_... columns in the DataFrame.)

3 Selecting specific columns

Usually not every single value from a FHIR instance is needed for analysis. There are two ways to get a more concise DataFrame:

Use the approach above to load all elements into a DataFrame, remove the unneeded columns, and rename the remaining columns as needed. The process_function capability in FHIR-PYrate allows you to integrate this approach into the bundles_to_dataframe() method call.
Use FHIRPath to select specific elements and map them onto column names.

The second approach is typically more concise. For example, to generate a DataFrame like this…

id	gender	date_of_birth	marital_status
…	…	…	…

…you could use the following code:

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[
        ("id", "identifier[0].value"),
        ("gender", "gender"),
        ("date_of_birth", "birthDate"),
        ("marital_status", "maritalStatus.coding[0].code"),
    ],
)
df

https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 574.80it/s]

	id	gender	date_of_birth	marital_status
0	bf23e283-4791-46e1-9d79-9e0ad9edd436	female	1971-01-13	S
1	bf23e283-4791-46e1-9d79-9e0ad9edd436	female	1971-01-14	S

While FHIRPath can be quite complex, its use in FHIR-PYrate is often straight forward. Nested elements are separated with ., and elements with multiple sub-values are identified by [N] where N is an integer starting at 0. The element paths can typically be constructed by loading all elements into a DataFrame and then manually deriving the FHIRPaths from the column names, or by looking at the hierarchy resource pages in the FHIR specification (see Key FHIR Resources for more information on reading the FHIR specification).

4 Elements with multiple sub-values

There are multiple identifier[N].value values for each instance of Patient in this dataset.

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[("id", "identifier[0].value"), ("identifiers", "identifier.value")],
)
df

https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 1061.85it/s]

	id	identifiers
0	bf23e283-4791-46e1-9d79-9e0ad9edd436	[bf23e283-4791-46e1-9d79-9e0ad9edd436, bf23e28...
1	bf23e283-4791-46e1-9d79-9e0ad9edd436	[bf23e283-4791-46e1-9d79-9e0ad9edd436, bf23e28...

To convert to separate columns, you can do the following:

df.join(pd.DataFrame(df.pop("identifiers").values.tolist()).add_prefix("identifier_"))

	id	identifier_0	identifier_1	identifier_2	identifier_3	identifier_4
0	bf23e283-4791-46e1-9d79-9e0ad9edd436	bf23e283-4791-46e1-9d79-9e0ad9edd436	bf23e283-4791-46e1-9d79-9e0ad9edd436	999-21-6325	S99948444	X30821805X
1	bf23e283-4791-46e1-9d79-9e0ad9edd436	bf23e283-4791-46e1-9d79-9e0ad9edd436	bf23e283-4791-46e1-9d79-9e0ad9edd436	999-21-6325	S99948444	X30821805X

This will give you separate identifier_0, identifier_1, … columns for each Patient.identifier[N] value.

5 Retrieving related data

To retrieve instances of related resources, additional request_params can be added. See Using the FHIR API to Access Data for more information on constructing the parameters for FHIR search interactions.

In the example below, instances of Patient and instances of related Observation resources are requested:

# Instantiate and perform the FHIR search interaction in a single function call
dfs = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={
        # Get instances of Observation where `Observation.patient` refers to a fetched Patient instance
        "_revinclude": "Observation:patient",
        "identifier": "https://github.com/synthetichealth/synthea|",
        "_count": 10,  # Get 10 instances per page
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
)

# `dfs` is a dictionary where the key is the FHIR resource type, and the value is the DataFrame
#
# Split these into separate variables for easy access:
df_patients = dfs["Patient"]
df_observations = dfs["Observation"]

# Look at the first row of the Observations DataFrame
df_observations.head(1).T

https://hapi.fhir.org/baseR4/Patient?_count=10&_revinclude=Observation:patient&identifier=https://github.com/synthetichealth/synthea|

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 835.69it/s]

	0
resourceType	Observation
id	10322494
meta_versionId	1
meta_lastUpdated	2023-04-19T10:08:24.062+00:00
meta_source	#fhjOEuPKx4iHuB2B
code_coding_0_system	http://foo
code_coding_0_code	12345
subject_reference	Patient/601248
effectiveDateTime	2023-04-19T09:34:50+01:00
valueQuantity_value	123
valueQuantity_unit	kg
valueQuantity_system	http://bar
valueQuantity_code	kg

As of April 2023, FHIR-PYrate does not have a good approach to fhir_paths for searches that return instances of multiple FHIR resource types.

To work around this, you can also iterate over all the rows in a DataFrame and request related resources using trade_rows_for_bundles():

df_observations2 = search.trade_rows_for_dataframe(
    df_patients,
    resource_type="Observation",
    request_params={
        "_count": "10",  # Get 10 instances per page
    },
    num_pages=1,
    # Load Observations where `Observation.subject` references the instance of Patient
    # identified by `id` in the `df_patients` DataFrame
    df_constraints={"subject": "id"},
    fhir_paths=[
        ("observation_id", "id"),
        ("patient", "subject.reference"),
        ("status", "status"),
        ("code", "code.coding[0].code"),
        ("code_display", "code.coding[0].display"),
        ("value", "valueQuantity.value"),
        ("value_units", "valueQuantity.unit"),
        ("datetime", "effectiveDateTime"),
    ],
)

# Look at the first row of the Observations DataFrame
df_observations2.head(15)

Query & Build DF (Observation):   0%|          | 0/2 [00:00<?, ?it/s]                                                                     Query & Build DF (Observation):   0%|          | 0/2 [00:00<?, ?it/s]Query & Build DF (Observation):  50%|█████     | 1/2 [00:00<00:00,  2.50it/s]

https://hapi.fhir.org/baseR4/Observation?_count=10&subject=258974

                                                                             Query & Build DF (Observation):  50%|█████     | 1/2 [00:01<00:00,  2.50it/s]Query & Build DF (Observation): 100%|██████████| 2/2 [00:01<00:00,  1.33it/s]Query & Build DF (Observation): 100%|██████████| 2/2 [00:01<00:00,  1.43it/s]

https://hapi.fhir.org/baseR4/Observation?_count=10&subject=298666

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 24
      1 df_observations2 = search.trade_rows_for_dataframe(
      2     df_patients,
      3     resource_type="Observation",
   (...)
     20     ],
     21 )
     23 # Look at the first row of the Observations DataFrame
---> 24 df_observations2.head(15)

AttributeError: 'dict' object has no attribute 'head'

Note that this will only display value for instances of Observation that record a value in Observation.valueQuantity. Typically, you would filter by Observation.code and then choose the appropriate data type for Observation.value[x] to import. For example, http://loinc.org|72166-2 is the LOINC for smoking status. To get smoking status records for all patients in df_patients:

# Directly search for smoking status observations

df_observations2 = search.steal_bundles_to_dataframe(
    resource_type="Observation",
    request_params={
        "code": "http://loinc.org|72166-2",  # LOINC code for smoking status
        "_count": 20,  # Get more observations since we're not limiting by patient
    },
    num_pages=1,
    fhir_paths=[
        ("observation_id", "id"),
        ("patient", "subject.reference"),
        ("status", "status"),
        ("code", "code.coding[0].code"),
        ("code_display", "code.coding[0].display"),
        ("value", "valueCodeableConcept.coding[0].code"),
        ("value_display", "valueCodeableConcept.coding[0].display"),
        ("datetime", "effectiveDateTime"),
    ],
)

# Look at the first row of the Observations DataFrame
df_observations2.head(15)

https://hapi.fhir.org/baseR4/Observation?_count=20&code=http://loinc.org|72166-2

Query & Build DF (Observation):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Observation): 100%|██████████| 1/1 [00:00<00:00, 16131.94it/s]

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 23
      3 df_observations2 = search.steal_bundles_to_dataframe(
      4     resource_type="Observation",
      5     request_params={
   (...)
     19     ],
     20 )
     22 # Look at the first row of the Observations DataFrame
---> 23 df_observations2.head(15)

AttributeError: 'dict' object has no attribute 'head'

More information about the search interaction used above to filter Observations is here.