FHIR for Research Workshop - Bulk Data

Introduction

Learning Objectives and Key Concepts

The goal of this workshop is to connect to the SMART Bulk Data Server and fetch a set of sample patient data.

In this exercise, you will:

  • Connect to an authorization server using a provided key, and retrieve an access token
  • Make a Bulk Data Export Request with that access token
  • Download the exported Bulk Data
  • Convert the downloaded data into DataFrames

While libraries like FHIR-PYrate allow you to fetch data from a server and parse it directly into a DataFrame, these libraries generally do not support FHIR Bulk Data. This workshop will step through the process of building up a tool to fetch Bulk Data and convert it into DataFrames.

This notebook is best experienced interactively. If the notebook already has output in it, you may clear that prior to starting via the menu: Cell -> All Output -> Clear.

Setup

If you are not using a JupyterHub instance with dependencies already installed, you will need to:

  1. Clone this repository
  2. Install dependencies with pip install -r requirements.txt
  3. Run jupyter notebook workshops/fhir-bulk-data

This should open the Jupyter environment in your browser window. You should see notebook.ipynb listed in the interface. Open this notebook in Jupyter, and you should be able to run the code.

Background

The Bulk Data Access standard enables researchers to retrieve large volumes of data from a patient population in an EHR. The Bulk Data Access standard is part of the SMART ecosystem, and SMART on FHIR can be used to authenticate and authorize applications that retrieve bulk data automatically

Clients of FHIR Bulk Data servers use SMART Backend Authorization to connect to the server. With SMART Backend Authorization, registered clients make a signed request to a token endpoint to receive a Bearer token, which they use for subsequent calls to the FHIR server.

Client registration often happens manually as a separate one-time event. The SMART Backend Authorization specification does not define an API for registration.

For this workshop, we connect to the SMART Bulk Data Server (https://bulk-data.smarthealthit.org). This is a developer tool provided by SMART Health IT to facilitate development with Bulk Data Access. This test server allows clients to “register” on the launch page by providing either a URL for a JSON Web Key Set(JWKS) or a raw JWKS. In this case, “registration” is not stored on the server. Instead, the FHIR Server URL contains the “registration” information stored as state in the URL and clientID. Production servers will usually have a more standard registration process rather than taking this approach.

For convenience, the SMART Bulk Data Server launch page allows users to generate a one-off JWKS to use for testing. For production usage, clients must generate their own certificates and JWKS and keep the private key private. In this workshop, we will use a JWKS generated by the launch page.

IMPORTANT: this workshop is not meant to be a formal documentation of the specification, and largely skips error handling and stays on the “happy path” for brevity and readability. We strongly recommend reviewing the specifications and adding error handling before using any of this code in a production environment.

# The default style for rendering JSON parsed as Python dicts isn't the best.
# Use this import and call `print(json)` when we want a cleaner view.

from rich import print

# Status bars for long-running cels
from tqdm.notebook import trange, tqdm

Getting our Access Token

The first step in obtaining data from a FHIR server that supports Bulk Data Access is to obtain an access token. That access token identifies and authorizes the client on requests made to the FHIR resource server.

Obtaining an access token is itself a two-step process: 1. Make a discovery request to the FHIR resource server to get the address of the authorization server. 2. Post a token request, signed by the client’s private key, to the authorization server

To keep the focus of this workshop on the Bulk Data process rather than the details of generating keys, we will use a JWKS pre-generated by the SMART Bulk Data server launch page.

For reference, the steps followed to generate the keys used here were:

  • Visit the SMART Bulk Data Server launch page
  • In the upper left, click the JWKS button for Authentication
  • Click the Generate button and choose Generate RS384
  • Choose R4 for the FHIR Version
  • The associated text box now contains a JWKS with both a public and private key, and the Launch Configuration contains a FHIR Server URL and Client ID
  • Convert the private key from the JWKS to “PEM” format so it can be used by Python (this is not easy to do natively in Python, so we have done it with JavaScript out of band)

Let’s start by defining our credentials. In practice, real credentials must always be stored and loaded securely, but for simplicity in this workshop we will define them as local variables.

client_id = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InJlZ2lzdHJhdGlvbi10b2tlbiJ9.eyJqd2tzIjp7ImtleXMiOlt7Imt0eSI6IlJTQSIsImFsZyI6IlJTMzg0IiwibiI6IngzMDc2RTJNaUpMR3JPbXJXRjZXSWZ1RjFSZDBlTjBSdEhUSVRuMlNGVWhMYTFQWE5Ia0xBR2xSSmtJWk1QMUk5SEhxdTRERy02d2JraFMweU9GbEZhZE1iaGgzcHkySHoybDctRmg1M3Y3bmpwb3dxUGV2eEpqMlpEQU5BanFWeHRLOGdvMm1BZmZFSnJ2ZkVHbm5oUGkzdGE1U2U5UTBkS29la2hJRVRCaVJTa0ozN0pobEZGSDh3S2hFLXVwaXBQU3VycTBrQ0JkNlNaS3NOVHpHNzJmLVJoNENiREZWTVdfRm5zcTh5LWRJMTdMSDJZcHBBLWc0eGlUZnMwMGZOUG9FUEdoWFU2bHFKMHMwclp4Um9zYnVuV0NTYi1UaEtWV0RyeUFudE83S3dWN1BxVG1NMmVrVS1yenZFaWprVjZfUUlnVTJxRTd6X1k1N1l4aW8zUSIsImUiOiJBUUFCIiwia2V5X29wcyI6WyJ2ZXJpZnkiXSwiZXh0Ijp0cnVlLCJraWQiOiI0ZDc3OTJjZTQyMDU0ZDVkZjhkZDg1ZjhiNTI3ZGQ4OCJ9LHsia3R5IjoiUlNBIiwiYWxnIjoiUlMzODQiLCJuIjoieDMwNzZFMk1pSkxHck9tcldGNldJZnVGMVJkMGVOMFJ0SFRJVG4yU0ZVaExhMVBYTkhrTEFHbFJKa0laTVAxSTlISHF1NERHLTZ3YmtoUzB5T0ZsRmFkTWJoaDNweTJIejJsNy1GaDUzdjduanBvd3FQZXZ4SmoyWkRBTkFqcVZ4dEs4Z28ybUFmZkVKcnZmRUdubmhQaTN0YTVTZTlRMGRLb2VraElFVEJpUlNrSjM3SmhsRkZIOHdLaEUtdXBpcFBTdXJxMGtDQmQ2U1pLc05Uekc3MmYtUmg0Q2JERlZNV19GbnNxOHktZEkxN0xIMllwcEEtZzR4aVRmczAwZk5Qb0VQR2hYVTZscUowczByWnhSb3NidW5XQ1NiLVRoS1ZXRHJ5QW50TzdLd1Y3UHFUbU0yZWtVLXJ6dkVpamtWNl9RSWdVMnFFN3pfWTU3WXhpbzNRIiwiZSI6IkFRQUIiLCJkIjoiUnptQWRTMlMtb1FsS1VGNHF1R0Npdm1KekE1R3lJeHRzTmR0V1JEZVluamdiSjZQbksyRzd3dXJMSlMyOTlYSEFYZld6a0ZwU2h3bDc5OHl1UEk0ckNXQ1ZXQ29fLWh5ci14Q2xlWEpCWVJQV292VXljODlVMTBsdzVtZ1cyWmRhWkotT2NLblBkYWZreERLME1wdkhmdkxZN09zd1lkX2Z4UHFQRTd3ZDlaQU5XLUIyWmNURUVmd2taNWdlcmtDdnFHQ1lEUTdVcVJqR3k1dWRjTkRiQ01ITFdGaEZZMTVqMDVMMFpJV0RwUDY2cmN6UWZEdnduR0pIbWxJbnJMbTl5WkowUTNkVlpHSmo2Y2dMeWI4WHhkNHpWRjZGSy1NX2VKbnFzZFRveHRPMDNUOVotSWlrN1BfbFBheWRvMWRycXRZdUxmZXpvU1lnUGp0V0NnV0JRIiwicCI6IjZwNlV5aGZiQ0JjQlEzcGttMHZEb1lqSDZsc1FCeS1PTzlEYlpfZnFfSHpzZl96UWhENDdua0dZZngxbGVTUFlQU0ZSeDlRTUR3cTlvYWxjYmEwNmE3QTVmMUxQNVpaRnNvSDVCTElHTUcxNmhDbW1mTEdRMURkZ3pMb2s3Q3RldDRnNGhUTlpseFZOYV9uYVNmZGJSdmQycF8zNTM1RGpaOXoyMEpSNllDYyIsInEiOiIyYXNhQ0RCTmY3NTQ1ajdOcXI2TTZiUW8wVGZEWGNlb2FxcGVtNGhpNE1pYUtBOEcydVFvdXNTOGcyUTlZOFZiZmxjX3I2WmxPVjIxSmJhYW5WN253MDRxbVpqMG5Xdkk0a19yX2lKWTVuSDNUMHk0Y0lGV21tLUhPY1dzazJXWl9QQ1NSc1piOU1qOUs4UXh6b1h5WEo0ck9aLUw4OTNZbDZ5bVdKa2xqVnMiLCJkcCI6Ik9LeWI5b0Z5dUc2T01KV2xMZHBNWkgzZEJPQ0FhNnZ5S01MWDdUSjNBZ3pQT0UtQ3N4OHhXWll3MXl2cnNpcVZkcGJRNFh0NGVqMjI5eEVwTVpreHpvZWdMQUItRmRDSl80Zmo5bDFtbjFZaXpVQWVabXFpT0pFMEFlQkpRUDlzX3RxYUJKc1YzaWdZTHFnSk1lcmRrclAtWnJBMEp1d2g4cG51eVEzRXplcyIsImRxIjoib2I2R0FvMjZHUEcxcnduLUZDR3lYanMwbFhzRlhwdHRaNDJmN1owa05IcDhLc1kzeHRJQl9mOFJRZVZyeE1hem5TZENPTWpCc1NZVDVLbFRMUnVIeHRZX3k1RWdQQllLMlRpZ1dXQzJoTTh0QWEwMTVNd0hTWTBVZ19hQ3JhaXpDNFRNZlhFS2hkUVFaTVJPYW5PWVRBQndpRW9wV2hhQXl2eE5ROHJSWDc4IiwicWkiOiJLSjhJU0RKaHVyUmEyTVRHdG4zWjR3NU9ob3o2N29OcE10MG1TakxGUEt0QjFWbjRaZ3VkTUxfWTZ4V2lWTnBOR1hQa3hoMEJjRmNKakNKcC0yeUZLV0d4Si14M2JMWVllbkVUaGRFSGRRR0xuUUszMHlEdHFTY2NDUVY5U2xGc281NUdnUmxhODNaY2NBZTdBMXBWN2sxRGE4dFVFNkE4TXNlQ1ZXamRLbFUiLCJrZXlfb3BzIjpbInNpZ24iXSwiZXh0Ijp0cnVlLCJraWQiOiI0ZDc3OTJjZTQyMDU0ZDVkZjhkZDg1ZjhiNTI3ZGQ4OCJ9XX0sImFjY2Vzc1Rva2Vuc0V4cGlyZUluIjoxNSwiaWF0IjoxNjg2NjUyNzM4fQ.j1urst068-21CxiH0Nqml7XoE9v6hWJ_vfqAK4W22vg'


# Don't worry! This is not anybody's real private key. It was generated specifically and only for this exercise.
private_key = """-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAx3076E2MiJLGrOmrWF6WIfuF1Rd0eN0RtHTITn2SFUhLa1PX
NHkLAGlRJkIZMP1I9HHqu4DG+6wbkhS0yOFlFadMbhh3py2Hz2l7+Fh53v7njpow
qPevxJj2ZDANAjqVxtK8go2mAffEJrvfEGnnhPi3ta5Se9Q0dKoekhIETBiRSkJ3
7JhlFFH8wKhE+upipPSurq0kCBd6SZKsNTzG72f+Rh4CbDFVMW/Fnsq8y+dI17LH
2YppA+g4xiTfs00fNPoEPGhXU6lqJ0s0rZxRosbunWCSb+ThKVWDryAntO7KwV7P
qTmM2ekU+rzvEijkV6/QIgU2qE7z/Y57Yxio3QIDAQABAoIBAEc5gHUtkvqEJSlB
eKrhgor5icwORsiMbbDXbVkQ3mJ44Gyej5ythu8LqyyUtvfVxwF31s5BaUocJe/f
MrjyOKwlglVgqP/ocq/sQpXlyQWET1qL1MnPPVNdJcOZoFtmXWmSfjnCpz3Wn5MQ
ytDKbx37y2OzrMGHf38T6jxO8HfWQDVvgdmXExBH8JGeYHq5Ar6hgmA0O1KkYxsu
bnXDQ2wjBy1hYRWNeY9OS9GSFg6T+uq3M0Hw78JxiR5pSJ6y5vcmSdEN3VWRiY+n
IC8m/F8XeM1RehSvjP3iZ6rHU6MbTtN0/WfiIpOz/5T2snaNXa6rWLi33s6EmID4
7VgoFgUCgYEA6p6UyhfbCBcBQ3pkm0vDoYjH6lsQBy+OO9DbZ/fq/Hzsf/zQhD47
nkGYfx1leSPYPSFRx9QMDwq9oalcba06a7A5f1LP5ZZFsoH5BLIGMG16hCmmfLGQ
1DdgzLok7Ctet4g4hTNZlxVNa/naSfdbRvd2p/3535DjZ9z20JR6YCcCgYEA2asa
CDBNf7545j7Nqr6M6bQo0TfDXceoaqpem4hi4MiaKA8G2uQousS8g2Q9Y8Vbflc/
r6ZlOV21JbaanV7nw04qmZj0nWvI4k/r/iJY5nH3T0y4cIFWmm+HOcWsk2WZ/PCS
RsZb9Mj9K8QxzoXyXJ4rOZ+L893Yl6ymWJkljVsCgYA4rJv2gXK4bo4wlaUt2kxk
fd0E4IBrq/IowtftMncCDM84T4KzHzFZljDXK+uyKpV2ltDhe3h6Pbb3ESkxmTHO
h6AsAH4V0In/h+P2XWafViLNQB5maqI4kTQB4ElA/2z+2poEmxXeKBguqAkx6t2S
s/5msDQm7CHyme7JDcTN6wKBgQChvoYCjboY8bWvCf4UIbJeOzSVewVem21njZ/t
nSQ0enwqxjfG0gH9/xFB5WvExrOdJ0I4yMGxJhPkqVMtG4fG1j/LkSA8FgrZOKBZ
YLaEzy0BrTXkzAdJjRSD9oKtqLMLhMx9cQqF1BBkxE5qc5hMAHCISilaFoDK/E1D
ytFfvwKBgCifCEgyYbq0WtjExrZ92eMOToaM+u6DaTLdJkoyxTyrQdVZ+GYLnTC/
2OsVolTaTRlz5MYdAXBXCYwiaftshSlhsSfsd2y2GHpxE4XRB3UBi50Ct9Mg7akn
HAkFfUpRbKOeRoEZWvN2XHAHuwNaVe5NQ2vLVBOgPDLHglVo3SpV
-----END RSA PRIVATE KEY-----"""

# note key id is the "kid" field from the JWKS -- it's same for both values of `keys`
key_id = "4d7792ce42054d5df8dd85f8b527dd88"

server_url = 'https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWwiOjB9/fhir'

We will use the Requests library for making all HTTP requests, and use a Session, in case we need to persist common settings such as proxy or SSL configuration.

import requests

session = requests.Session()

# Optional: Turn off SSL verification. Useful when dealing with a corporate proxy with self-signed certificates.
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
session.verify = False

Let’s start by confirming we can hit the server via the /metadata endpoint. When connecting to a server for the first time it is generally a good idea to review the metadata to see what the server supports, and that it matches your expectations. In this case, expect to see the name “SMART Sample Bulk Data Server”, and references to “export” operations.

r = session.get(f'{server_url}/metadata')
metadata = r.json()

print(metadata)
{
    'resourceType': 'CapabilityStatement',
    'status': 'active',
    'date': '2023-06-13T01:26:34+00:00',
    'publisher': "Boston Children's Hospital",
    'kind': 'instance',
    'instantiates': ['http://hl7.org/fhir/uv/bulkdata/CapabilityStatement/bulk-data'],
    'software': {'name': 'SMART Sample Bulk Data Server', 'version': '2.1.1'},
    'implementation': {'description': 'SMART Sample Bulk Data Server'},
    'fhirVersion': '4.0.1',
    'acceptUnknown': 'extensions',
    'format': ['json'],
    'rest': [
        {
            'mode': 'server',
            'security': {
                'extension': [
                    {
                        'url': 'http://fhir-registry.smarthealthit.org/StructureDefinition/oauth-uris',
                        'extension': [
                            {'url': 'token', 'valueUri': 'https://bulk-data.smarthealthit.org/auth/token'},
                            {'url': 'register', 'valueUri': 'https://bulk-data.smarthealthit.org/auth/register'}
                        ]
                    }
                ],
                'service': [
                    {
                        'coding': [
                            {
                                'system': 'http://hl7.org/fhir/restful-security-service',
                                'code': 'SMART-on-FHIR',
                                'display': 'SMART-on-FHIR'
                            }
                        ],
                        'text': 'OAuth2 using SMART-on-FHIR profile (see http://docs.smarthealthit.org)'
                    }
                ]
            },
            'resource': [
                {
                    'type': 'Patient',
                    'operation': [
                        {
                            'extension': [
                                {
                                    'url': 
'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation',
                                    'valueCode': 'SHOULD'
                                }
                            ],
                            'name': 'patient-export',
                            'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/patient-export'
                        }
                    ]
                },
                {
                    'type': 'Group',
                    'operation': [
                        {
                            'extension': [
                                {
                                    'url': 
'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation',
                                    'valueCode': 'SHOULD'
                                }
                            ],
                            'name': 'group-export',
                            'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/group-export'
                        }
                    ]
                },
                {
                    'type': 'OperationDefinition',
                    'profile': {'reference': 'http://hl7.org/fhir/Profile/OperationDefinition'},
                    'interaction': [{'code': 'read'}],
                    'searchParam': []
                }
            ],
            'operation': [
                {'name': 'get-resource-counts', 'definition': 'OperationDefinition/-s-get-resource-counts'},
                {
                    'extension': [
                        {
                            'url': 'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation',
                            'valueCode': 'SHOULD'
                        }
                    ],
                    'name': 'export',
                    'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/export'
                }
            ]
        }
    ]
}

The SMART Backend Authorization specification defines that the token endpoint will be published as part of the FHIR resource server’s SMART metadata, at .well-known/smart-configuration. Let’s fetch that endpoint and review the contents.

r = session.get(f'{server_url}/.well-known/smart-configuration')
smart_config = r.json()

print(smart_config)
{
    'token_endpoint': 'https://bulk-data.smarthealthit.org/auth/token',
    'registration_endpoint': 'https://bulk-data.smarthealthit.org/auth/register',
    'token_endpoint_auth_methods_supported': ['private_key_jwt'],
    'token_endpoint_auth_signing_alg_values_supported': [
        'HS256',
        'HS384',
        'HS512',
        'RS256',
        'RS384',
        'RS512',
        'ES256',
        'ES384',
        'ES512',
        'PS256',
        'PS384',
        'PS512'
    ],
    'scopes_supported': [
        'system/*.rs',
        'system/Patient.rs',
        'system/Encounter.rs',
        'system/Condition.rs',
        'system/Claim.rs',
        'system/ExplanationOfBenefit.rs',
        'system/Observation.rs',
        'system/Immunization.rs',
        'system/DiagnosticReport.rs',
        'system/Procedure.rs',
        'system/CareTeam.rs',
        'system/CarePlan.rs',
        'system/MedicationRequest.rs',
        'system/AllergyIntolerance.rs',
        'system/Device.rs',
        'system/ImagingStudy.rs',
        'system/Organization.rs',
        'system/Practitioner.rs',
        'system/DocumentReference.rs',
        'system/Group.rs',
        'system/*.read',
        'system/Patient.read',
        'system/Encounter.read',
        'system/Condition.read',
        'system/Claim.read',
        'system/ExplanationOfBenefit.read',
        'system/Observation.read',
        'system/Immunization.read',
        'system/DiagnosticReport.read',
        'system/Procedure.read',
        'system/CareTeam.read',
        'system/CarePlan.read',
        'system/MedicationRequest.read',
        'system/AllergyIntolerance.read',
        'system/Device.read',
        'system/ImagingStudy.read',
        'system/Organization.read',
        'system/Practitioner.read',
        'system/DocumentReference.read',
        'system/Group.read'
    ],
    'capabilities': ['permission-v2', 'permission-v1', 'client-confidential-asymmetric']
}

We care most about the token_endpoint field, which we need to request our JWT. For more information about the other fields, see here.

token_endpoint = smart_config['token_endpoint']

Now we have our token endpoint, so we can make a request to it to get a token. The request follows the OAuth 2.0 “Client Credentials” flow, using a JSON Web Token (JWT) assertion containing our client ID and signed with our private key.

📘 Read more about the access token request specification

# Create a JWT client assertion as follows:
import jwt
import datetime

assertion = jwt.encode({
        'iss': client_id,   # "iss" == "issuer", the client that created this JWT
        'sub': client_id,   # "sub" == "subject", the client that will use the access token
        'aud': token_endpoint,  # "aud" == "audience", the receiver of this request
        'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
    },
    private_key,  # signed with the private key
    algorithm='RS384', # algorithm for the key
    headers={"kid": key_id}) # kid is required for smart bulk data server


# And then POST it to the token endpont
r = session.post(token_endpoint, data={
    'scope': 'system/*.read',
    'grant_type': 'client_credentials',
    'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
    'client_assertion': assertion
})

token_response = r.json()

# And inspect the response:
token_response
{'token_type': 'bearer',
 'scope': 'system/*.read',
 'expires_in': 300,
 'access_token': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbl90eXBlIjoiYmVhcmVyIiwic2NvcGUiOiJzeXN0ZW0vKi5yZWFkIiwiZXhwaXJlc19pbiI6MzAwLCJpYXQiOjE2ODY2NTQzNTIsImV4cCI6MTY4NjY1NDY1Mn0.6VcZfI7YBkrGV7IoIBKmQo2usjrpCkIgmJHx8jFir3g'}

Two important fields we need to keep track of are the token itself, and the expire time. Tokens are only valid for a certain amount of time, and once they expire we will need to fetch a new one via the same process as above. expires_in is in seconds from the current time, so we’ll add that to the current time to get a timestamp we can compare against.

Note that for this example we requested and received 'scope': 'system/*.read' which allows access to all resource types. In practice, requesting access to all resource types is generally not recommended, and servers do not always support asking for * scopes. Generally it is recommended to request only the minimal level of access necessary.

token = token_response['access_token']
expire_time = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in'])

To make this easier for ourselves, let’s package this up into a get_token() function that we can call anytime we need to use a token. If the current token is still valid, use that, or if it has expired, fetch a new one. The logic is exactly the same as the previous steps we just ran:

def get_token():
    global token, expire_time
    if datetime.datetime.now() < expire_time:
        # the existing token is still valid so return it
        return token

    assertion = jwt.encode({
            'iss': client_id,
            'sub': client_id,
            'aud': token_endpoint,
            'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
    }, private_key, algorithm='RS384',
    headers={"kid": key_id})

    r = session.post(token_endpoint, data={
        'scope': 'system/*.read',
        'grant_type': 'client_credentials',
        'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
        'client_assertion': assertion
    })

    token_response = r.json()
    token = token_response['access_token']
    expire_time = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in'])

    return token

Starting, Checking, and Downloading the Export

Now that we have an access token, the next step in using Bulk Data is to request the export of data, via a “kick-off request”. This is an asynchronous request – once the request is accepted, instead of returning the results directly, the server response will point to a URL where the client can check the status.

There are three levels of export: - Patient, to obtain resources related to all Patients - Group, to obtain resources associated with a particular Group - System, to obtain all resources, whether or not they are associated with a patient

For this exercise we will initially only request Patient-level data, but the general process for Groups and System-level data is exactly the same - there is just a different endpoint to hit, and a different set of data will be returned.

There are also a number of parameters that may be set, but to keep things simple we will only use the _type parameter, to request only Patient and Condition resource types.

📘 Read more about the Bulk Data Kick-off Request

Let’s make the export request and inspect the response headers. For “Patient” level data, the URL we want to hit is {server}/Patient/$export. Our token is used in the “Authorization” header in the format "Bearer {token}".

r = session.get(f'{server_url}/Patient/$export?_type=Patient,Condition',
                headers={'Authorization': f'Bearer {get_token()}',
                         'Accept': 'application/fhir+json',
                         'Prefer': 'respond-async'})

print(r.headers)
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'Content-Location': 
'https://bulk-data.smarthealthit.org/fhir/bulkstatus/55e2770a4b9c3f861f002634bd44ac62', 'Content-Type': 
'application/json; charset=utf-8', 'Content-Length': '644', 'Etag': 'W/"284-G8JHR+JPFTg+y5JhfrRbOzc4ZMI"', 'Date': 
'Tue, 13 Jun 2023 11:05:52 GMT', 'Via': '1.1 vegur'}

We see the status URL in the Content-Location header, so let’s save that into a variable.

check_url = r.headers['Content-Location']

We can now check the status by getting that URL, and the HTTP status code of the response will indicate the exort status. - Code 200 means the export is complete, and the response body will indicate the location - Code 202 means the export is still in progress - Codes in the range 4xx-5xx indicate an error has occurred. 4xx codes generally indicate an error in the request, and 5xx codes generally indicate a server error.

Note that in production environments it is recommended to check the status as infrequently as possible, to minimize the load on the server. In this case we expect the export to complete in just a few seconds so the impact of checking every two seconds is minimal. The server will also include a “Retry-After” header which will give us a hint on how long to wait before trying again. We’ll check that status in a loop, and break out of the loop when we get a complete or error response. We’ll print status each time through the loop, and the response body when the export is complete.

# Now we check the status in a loop

from time import sleep

while True:
    r = session.get(check_url, headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})

    if r.status_code == 200:
        # complete
        response = r.json()
        print(response)
        break

    elif r.status_code == 202:
        # in progress
        print(r.headers)

        delay = r.headers['Retry-After']

        print(f"Sleeping {delay} seconds before retrying")
        sleep(int(delay))

    else:
        # error
        print(r.text)

        break
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '1% complete, currenly 
processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:52 GMT', 'Content-Length': '0', 
'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '21% complete, currenly 
processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:54 GMT', 'Content-Length': '0', 
'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '42% complete, currenly 
processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:56 GMT', 'Content-Length': '0', 
'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '63% complete, currenly 
processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:58 GMT', 'Content-Length': '0', 
'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '83% complete, currenly 
processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:06:00 GMT', 'Content-Length': '0', 
'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{
    'transactionTime': '1686654352606',
    'request': 
'https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZW
wiOjB9/fhir/Patient/$export?_type=Patient,Condition',
    'requiresAccessToken': True,
    'output': [
        {
            'type': 'Condition',
            'count': 639,
            'url': 
'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdC
I6NjM5LCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Condition.ndjson'
        },
        {
            'type': 'Patient',
            'count': 100,
            'url': 
'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdC
I6MTAwLCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Patient.ndjson'
        }
    ],
    'deleted': [],
    'error': []
}

We can see that the response points us to one or more NDJSON (Newline Delimited JSON) files per resource type, in the output field of the response.

Note that in this case the volume of data is relatively small, and there is only one entry in the list per resource type, but for large datasets it is possible that there could be multiple files (and therefore multiple entries in this list) per resource type.

Let’s save that list to a variable.

output_files = response['output']
output_files
[{'type': 'Condition',
  'count': 639,
  'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdCI6NjM5LCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Condition.ndjson'},
 {'type': 'Patient',
  'count': 100,
  'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdCI6MTAwLCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Patient.ndjson'}]

Now we can loop through the list and download each one. Each file is an NDJSON, so that means we’ll see one resource per line.

To make each step clear and distinct, we’ll keep a dict of { resourceType: [resources,...]} which we can process later.

Note: for this exercise we are only reading the NDJSON files into a dict in memory, but in practice you may want to save the file locally first in case there are errors in processing, especially if the files are large.

import json

resources_by_type = {}

for output_file in tqdm(output_files):
    download_url = output_file['url']
    resource_type = output_file['type']

    r = session.get(download_url, headers={'Authorization': f'Bearer {get_token()}',
                                           'Accept': 'application/fhir+json'})

    ndjson = r.text.strip()  # remove any whitespace, in particular trailing newlines

    if resource_type not in resources_by_type:
        resources_by_type[resource_type] = []

    # NDJSON can't be parsed as a whole, we have to process it line-by-line
    for line in ndjson.split('\n'):
        resource = json.loads(line)
        resources_by_type[resource_type].append(resource)


# This is a large amount of JSON data, only uncomment this line if you care to review
# print(resources_by_type)

Converting to DataFrames

Finally, let’s convert these into DataFrames.

The quick-and-dirty option is to use the Pandas json_normalize() function to parse a list of dicts into a DataFrame.

📘 Read more about pandas.json_normalize

import pandas as pd

resource_dfs = {}

for resource_type, resources in resources_by_type.items():
    resource_dfs[resource_type] = pd.json_normalize(resources)

# Now we can work with them by type:

resource_dfs['Patient']
resourceType id extension identifier name telecom gender birthDate address multipleBirthBoolean communication text.status text.div maritalStatus.coding maritalStatus.text multipleBirthInteger
0 Patient 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Lemke', 'given... [{'system': 'phone', 'value': '555-532-1156', ... male 1965-01-13 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
1 Patient 58c297c4-d684-4677-8024-01131d93835e [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Wintheiser', '... [{'system': 'phone', 'value': '555-712-4709', ... female 1971-04-05 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
2 Patient 538a9a4e-8437-47d3-8c01-1a17dca8f0be [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Alaniz', 'give... [{'system': 'phone', 'value': '555-446-6900', ... male 1923-03-24 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
3 Patient c6c60742-8694-46e4-bb42-b00bf6d8b536 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Walsh', 'given... [{'system': 'phone', 'value': '555-436-4287', ... female 1965-10-27 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
4 Patient fbfec681-d357-4b28-b1d2-5db6434c7846 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Bednar', 'give... [{'system': 'phone', 'value': '555-405-4909', ... female 1942-07-04 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 Patient 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Schmeler', 'gi... [{'system': 'phone', 'value': '555-971-6300', ... male 1995-10-19 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... Never Married NaN
96 Patient c1981741-f90e-4077-9156-429a3c4c5ded [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Lubowitz', 'gi... [{'system': 'phone', 'value': '555-328-5229', ... male 1956-05-06 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
97 Patient f98b23bf-4443-46d0-9eaf-563e767cf948 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Funk', 'given'... [{'system': 'phone', 'value': '555-497-7639', ... male 1966-02-07 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... M NaN
98 Patient c536dee9-9ef6-4807-ae20-9f1045c9c7d6 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Bergstrom', 'g... [{'system': 'phone', 'value': '555-845-1730', ... male 1990-11-18 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... S NaN
99 Patient a845ead4-d9de-42eb-b4b5-eb21a8963578 [{'url': 'http://hl7.org/fhir/StructureDefinit... [{'system': 'https://github.com/synthetichealt... [{'use': 'official', 'family': 'Pagac', 'given... [{'system': 'phone', 'value': '555-504-1379', ... female 1968-04-20 [{'extension': [{'url': 'http://hl7.org/fhir/S... False [{'language': {'coding': [{'system': 'urn:ietf... generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... [{'system': 'http://terminology.hl7.org/CodeSy... S NaN

100 rows × 16 columns

This works, but it’s clearly not ideal in how it handles nested fields, such as the nested lists of the name field. One way we can do a little better is with the flatten_json library: https://github.com/amirziai/flatten

from flatten_json import flatten

for resource_type, resources in resources_by_type.items():
    resource_dfs[resource_type] = pd.json_normalize(list(map(lambda r: flatten(r), resources)))

# Now let's take another look
resource_dfs['Patient']
resourceType id text_status text_div extension_0_url extension_0_valueString extension_1_url extension_1_valueAddress_city extension_1_valueAddress_state extension_1_valueAddress_country ... multipleBirthBoolean communication_0_language_coding_0_system communication_0_language_coding_0_code communication_0_language_coding_0_display communication_0_language_text name_1_use name_1_family name_1_given_0 name_1_prefix_0 multipleBirthInteger
0 Patient 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Lettie Boyle http://hl7.org/fhir/StructureDefinition/patien... Boston Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN
1 Patient 58c297c4-d684-4677-8024-01131d93835e generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Marquetta Schamberger http://hl7.org/fhir/StructureDefinition/patien... Macau Macao Special Administrative Region of the Peo... CN ... False urn:ietf:bcp:47 zh Chinese Chinese maiden Heathcote Aleta Mrs. NaN
2 Patient 538a9a4e-8437-47d3-8c01-1a17dca8f0be generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Pilar Orta http://hl7.org/fhir/StructureDefinition/patien... San Jose San Jose CR ... False urn:ietf:bcp:47 es Spanish Spanish NaN NaN NaN NaN NaN
3 Patient c6c60742-8694-46e4-bb42-b00bf6d8b536 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Arvilla Haag http://hl7.org/fhir/StructureDefinition/patien... Norton Massachusetts US ... False urn:ietf:bcp:47 en-US English English maiden Kuphal Alyce Mrs. NaN
4 Patient fbfec681-d357-4b28-b1d2-5db6434c7846 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Marcelina Harber http://hl7.org/fhir/StructureDefinition/patien... Brockton Massachusetts US ... False urn:ietf:bcp:47 en-US English English maiden Runolfsson Arnette Mrs. NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 Patient 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Allison Daugherty http://hl7.org/fhir/StructureDefinition/patien... Boston Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN
96 Patient c1981741-f90e-4077-9156-429a3c4c5ded generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Antoinette Parker http://hl7.org/fhir/StructureDefinition/patien... Mansfield Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN
97 Patient f98b23bf-4443-46d0-9eaf-563e767cf948 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Barbar Windler http://hl7.org/fhir/StructureDefinition/patien... Randolph Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN
98 Patient c536dee9-9ef6-4807-ae20-9f1045c9c7d6 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Juli Johns http://hl7.org/fhir/StructureDefinition/patien... Holyoke Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN
99 Patient a845ead4-d9de-42eb-b4b5-eb21a8963578 generated <div xmlns="http://www.w3.org/1999/xhtml">Gene... http://hl7.org/fhir/StructureDefinition/patien... Lanie Hyatt http://hl7.org/fhir/StructureDefinition/patien... Millis Massachusetts US ... False urn:ietf:bcp:47 en-US English English NaN NaN NaN NaN NaN

100 rows × 73 columns

Let’s look at just one row so it’s easier to see all the columns and an example value:

with pd.option_context('display.max_rows', 1000, 'display.max_columns', 10):
    print(resource_dfs['Patient'].loc[0].T)
resourceType                                                                                Patient
id                                                             6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2
text_status                                                                               generated
text_div                                          <div xmlns="http://www.w3.org/1999/xhtml">Gene...
extension_0_url                                   http://hl7.org/fhir/StructureDefinition/patien...
extension_0_valueString                                                                Lettie Boyle
extension_1_url                                   http://hl7.org/fhir/StructureDefinition/patien...
extension_1_valueAddress_city                                                                Boston
extension_1_valueAddress_state                                                        Massachusetts
extension_1_valueAddress_country                                                                 US
extension_2_url                                   http://synthetichealth.github.io/synthea/disab...
extension_2_valueDecimal                                                                   0.305628
extension_3_url                                   http://synthetichealth.github.io/synthea/quali...
extension_3_valueDecimal                                                                  53.694372
identifier_0_system                                      https://github.com/synthetichealth/synthea
identifier_0_value                                             6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2
identifier_1_type_coding_0_system                     http://terminology.hl7.org/CodeSystem/v2-0203
identifier_1_type_coding_0_code                                                                  MR
identifier_1_type_coding_0_display                                            Medical Record Number
identifier_1_type_text                                                        Medical Record Number
identifier_1_system                                               http://hospital.smarthealthit.org
identifier_1_value                                             6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2
identifier_2_type_coding_0_system                     http://terminology.hl7.org/CodeSystem/v2-0203
identifier_2_type_coding_0_code                                                                  SS
identifier_2_type_coding_0_display                                           Social Security Number
identifier_2_type_text                                                       Social Security Number
identifier_2_system                                                  http://hl7.org/fhir/sid/us-ssn
identifier_2_value                                                                      999-18-8203
identifier_3_type_coding_0_system                     http://terminology.hl7.org/CodeSystem/v2-0203
identifier_3_type_coding_0_code                                                                  DL
identifier_3_type_coding_0_display                                                 Driver's License
identifier_3_type_text                                                             Driver's License
identifier_3_system                                                urn:oid:2.16.840.1.113883.4.3.25
identifier_3_value                                                                        S99914534
identifier_4_type_coding_0_system                     http://terminology.hl7.org/CodeSystem/v2-0203
identifier_4_type_coding_0_code                                                                 PPN
identifier_4_type_coding_0_display                                                  Passport Number
identifier_4_type_text                                                              Passport Number
identifier_4_system                               http://standardhealthrecord.org/fhir/Structure...
identifier_4_value                                                                       X41457228X
name_0_use                                                                                 official
name_0_family                                                                                 Lemke
name_0_given_0                                                                                Abram
name_0_prefix_0                                                                                 Mr.
telecom_0_system                                                                              phone
telecom_0_value                                                                        555-532-1156
telecom_0_use                                                                                  home
gender                                                                                         male
birthDate                                                                                1965-01-13
address_0_extension_0_url                         http://hl7.org/fhir/StructureDefinition/geoloc...
address_0_extension_0_extension_0_url                                                      latitude
address_0_extension_0_extension_0_valueDecimal                                            42.264144
address_0_extension_0_extension_1_url                                                     longitude
address_0_extension_0_extension_1_valueDecimal                                           -72.642902
address_0_line_0                                                                  167 Nikolaus Gate
address_0_city                                                                          Easthampton
address_0_state                                                                       Massachusetts
address_0_postalCode                                                                          01027
address_0_country                                                                                US
maritalStatus_coding_0_system                     http://terminology.hl7.org/CodeSystem/v3-Marit...
maritalStatus_coding_0_code                                                                       M
maritalStatus_coding_0_display                                                                    M
maritalStatus_text                                                                                M
multipleBirthBoolean                                                                          False
communication_0_language_coding_0_system                                            urn:ietf:bcp:47
communication_0_language_coding_0_code                                                        en-US
communication_0_language_coding_0_display                                                   English
communication_0_language_text                                                               English
name_1_use                                                                                      NaN
name_1_family                                                                                   NaN
name_1_given_0                                                                                  NaN
name_1_prefix_0                                                                                 NaN
multipleBirthInteger                                                                            NaN
Name: 0, dtype: object

Next, what if we know in advance we will only want certain fields?

Let’s follow the same pattern the FHIR-PYrate library uses, and use FHIRPath to define the fields we want to extract, along with a friendly name. For this we’ll use the fhirpathpy library.

FHIRPath is:

a path based navigation and extraction language, somewhat like XPath. Operations are expressed in terms of the logical content of hierarchical data models, and support traversal, selection and filtering of data.

If you are not familiar with FHIRPath, Section 3 of the FHIRPath spec describes some of the basics.

import fhirpathpy

fhir_paths = [
        ["id", "identifier[0].value"],
        ["gender", "gender"],
        ["date_of_birth", "birthDate"],
        ["marital_status", "maritalStatus.coding.first().code"]
    ]

# compile the fhirpath so they can be reused. this will result in better performance on large datasets
for f in fhir_paths:
     f[1] = fhirpathpy.compile(f[1])

for resource_type, resources in resources_by_type.items():
    filtered_resources = []

    for resource in resources:
        filtered_resource = {}
        for f in fhir_paths:
            fieldname = f[0]
            func = f[1]
            filtered_resource[fieldname] = func(resource)

            # fhirpathpy always returns a list, which can make the DataFrame messy
            # if it's a list with only one item, extract the item from the list
            if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
                filtered_resource[fieldname] = filtered_resource[fieldname][0]

        filtered_resources.append(filtered_resource)

    resource_dfs[resource_type] = pd.json_normalize(list(map(lambda r: flatten(r), filtered_resources)))


resource_dfs['Patient']
id gender date_of_birth marital_status
0 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 male 1965-01-13 M
1 58c297c4-d684-4677-8024-01131d93835e female 1971-04-05 M
2 538a9a4e-8437-47d3-8c01-1a17dca8f0be male 1923-03-24 M
3 c6c60742-8694-46e4-bb42-b00bf6d8b536 female 1965-10-27 M
4 fbfec681-d357-4b28-b1d2-5db6434c7846 female 1942-07-04 M
... ... ... ... ...
95 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 male 1995-10-19 S
96 c1981741-f90e-4077-9156-429a3c4c5ded male 1956-05-06 M
97 f98b23bf-4443-46d0-9eaf-563e767cf948 male 1966-02-07 M
98 c536dee9-9ef6-4807-ae20-9f1045c9c7d6 male 1990-11-18 S
99 a845ead4-d9de-42eb-b4b5-eb21a8963578 female 1968-04-20 S

100 rows × 4 columns

Bringing it all together

Now we have everything we need to connect to a FHIR server that supports Bulk Data, request and download exported data, and convert it into a DataFrame. Let’s bring everything together from the previous steps into one class with a clear entrypoint.

import requests
import jwt
import datetime
import json
import fhirpathpy
from flatten_json import flatten
from typing import Optional
from collections import defaultdict

class BulkDataFetcher:
    def __init__(
        self,
        base_url: str,
        client_id: str,
        private_key: str,
        key_id: str,
        endpoint: Optional[str] = None,
        session: Optional[str] = None
    ):
        self.base_url = base_url
        self.client_id = client_id
        self.private_key = private_key
        self.key_id = key_id

        self.token = None
        self.token_expire_time = None

        if endpoint is None:
            self.endpoint = "Patient"
        else:
            self.endpoint = endpoint


        if session is None:
            self.session = requests.Session()
        else:
            self.session = session

        r = self.session.get(f'{base_url}/.well-known/smart-configuration')
        smart_config = r.json()
        self.token_endpoint = smart_config['token_endpoint']

        self.resource_types = []
        self.fhir_paths = {}

        # Store raw FHIR resource instances; populated as part of get_dataframes()
        self.resources_by_type = {}


    def get_token(self):
        if self.token and datetime.datetime.now() < self.expire_time:
            # the existing token is still valid so use it
            return self.token

        assertion = jwt.encode({
                'iss': self.client_id,
                'sub': self.client_id,
                'aud': self.token_endpoint,
                'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
        }, self.private_key, algorithm='RS384',
        headers={"kid": key_id})

        r = self.session.post(self.token_endpoint, data={
            'scope': 'system/*.read',
            'grant_type': 'client_credentials',
            'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
            'client_assertion': assertion
        })

        token_response = r.json()
        self.token = token_response['access_token']
        self.expire_time = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in'])

        return self.token

    def add_resource_type(self, resource_type: str, fhir_paths = None):
        self.resource_types.append(resource_type)
        if fhir_paths:
            # fhir_paths=[
            #    ("id", "identifier[0].value"),
            #    ("marital_status", "maritalStatus.coding[0].code")
            # ]
            compiled_fhir_paths = [(f[0], fhirpathpy.compile(f[1])) for f in fhir_paths]
            self.fhir_paths[resource_type] = compiled_fhir_paths

    def _invoke_request(self):
        types = ','.join(self.resource_types)
        url = f'{self.base_url}/{self.endpoint}/$export?_type={types}'
        print(f'Fetching from {url}')
        r = self.session.get(url, headers={'Authorization': f'Bearer {self.get_token()}', 'Accept': 'application/fhir+json', 'Prefer': 'respond-async'})

        self.check_url = r.headers['Content-Location']
        return self.check_url

    def _wait_until_ready(self):
        while True:
            r = self.session.get(self.check_url, headers={'Authorization': f'Bearer {self.get_token()}', 'Accept': 'application/fhir+json'})

            # There are three possible options here: http://hl7.org/fhir/uv/bulkdata/export.html#bulk-data-status-request
            # Error = 4xx or 5xx status code
            # In-Progress = 202
            # Complete = 200

            if r.status_code == 200:
                # complete
                response = r.json()
                self.output_files = response['output']
                return self.output_files

            elif r.status_code == 202:
                # in progress
                delay = r.headers['Retry-After']

                sleep(int(delay))

            else:
                raise RuntimeError(r.text)

    def get_dataframes(self):
        self._invoke_request()
        self._wait_until_ready()

        resources_by_type = {}
        self.resources_by_type = {} # Reset store of raw FHIR resources each time this is run

        for output_file in self.output_files:
            download_url = output_file['url']
            resource_type = output_file['type']

            r = self.session.get(download_url, headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})

            ndjson = r.text.strip()

            if resource_type not in resources_by_type:
                resources_by_type[resource_type] = []
                self.resources_by_type[resource_type] = []

            for line in ndjson.split('\n'):
                resource = json.loads(line)

                # Make raw resource instances available for future use
                self.resources_by_type[resource_type].append(resource)

                if resource_type in self.fhir_paths:
                    fhir_paths = self.fhir_paths[resource_type]
                    filtered_resource = {}
                    for f in fhir_paths:
                        fieldname = f[0]
                        func = f[1]
                        filtered_resource[fieldname] = func(resource)

                        if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
                            filtered_resource[fieldname] = filtered_resource[fieldname][0]
                    resource = filtered_resource

                resources_by_type[resource_type].append(resource)

        dfs = {}

        for resource_type, resources in resources_by_type.items():
            dfs[resource_type] = pd.json_normalize(list(map(lambda r: flatten(r), resources)))

        return dfs

    def get_example_resource(self, resource_type: str, resource_id: Optional[str] = None):
        if self.resources_by_type is None:
            print("You need to run get_dataframes() first")
            return None

        if resource_type not in self.resources_by_type:
            print(f"{resource_type} not available. Try one of these: {', '.join(self.resources_by_type.keys())}")
            return None

        if resource_id is None:
            return self.resources_by_type[resource_type][0]

        resource = [r for r in self.resources_by_type[resource_type] if r['id'] == resource_id]

        if len(resource) > 0:
            return resource[0]

        print(f"No {resource_type} with id={resource_id} was found.")
        return None

    def reprocess_dataframes(self, fhir_paths):
        return BulkDataFetcher._reprocess_dataframes(self.resources_by_type, fhir_paths)

    @classmethod
    def _reprocess_dataframes(cls, obj_resources_by_type, user_fhir_paths):
        parsed_resources_by_type = defaultdict(list)

        for this_resource_type in obj_resources_by_type.keys():
            if this_resource_type in user_fhir_paths:
                user_fhir_paths[this_resource_type] = [(f[0], fhirpathpy.compile(f[1])) for f in user_fhir_paths[this_resource_type]]
            for resource in obj_resources_by_type[this_resource_type]:
                if this_resource_type in user_fhir_paths:
                    filtered_resource = {}
                    for f in user_fhir_paths[this_resource_type]:
                        fieldname = f[0]
                        func = f[1]
                        filtered_resource[fieldname] = func(resource)

                        if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
                            filtered_resource[fieldname] = filtered_resource[fieldname][0]
                    parsed_resources_by_type[this_resource_type].append(filtered_resource)
                else:
                    parsed_resources_by_type[this_resource_type].append(resource)

        dfs = {}

        for t, res in parsed_resources_by_type.items():
            dfs[t] = pd.json_normalize(list(map(lambda r: flatten(r), res)))

        return dfs
# And then to invoke it:

# create a BulkDataFetcher with our credentials
fetcher = BulkDataFetcher(
    base_url=server_url, client_id=client_id, private_key=private_key, key_id=key_id, session=session
)

# add a resource type of interest, with some FHIRPath field mappings
fetcher.add_resource_type('Patient', [
        ("id", "identifier[0].value"),
        ("gender", "gender"),
        ("date_of_birth", "birthDate"),
        ("marital_status", "maritalStatus.coding.first().code")
])

# add another resource type, with no FHIRPath mappings (load the entire resource)
fetcher.add_resource_type('Condition')

dfs = fetcher.get_dataframes()

dfs['Patient']
Fetching from 
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw
iOjB9/fhir/Patient/$export?_type=Patient,Condition
id gender date_of_birth marital_status
0 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 male 1965-01-13 M
1 58c297c4-d684-4677-8024-01131d93835e female 1971-04-05 M
2 538a9a4e-8437-47d3-8c01-1a17dca8f0be male 1923-03-24 M
3 c6c60742-8694-46e4-bb42-b00bf6d8b536 female 1965-10-27 M
4 fbfec681-d357-4b28-b1d2-5db6434c7846 female 1942-07-04 M
... ... ... ... ...
95 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 male 1995-10-19 S
96 c1981741-f90e-4077-9156-429a3c4c5ded male 1956-05-06 M
97 f98b23bf-4443-46d0-9eaf-563e767cf948 male 1966-02-07 M
98 c536dee9-9ef6-4807-ae20-9f1045c9c7d6 male 1990-11-18 S
99 a845ead4-d9de-42eb-b4b5-eb21a8963578 female 1968-04-20 S

100 rows × 4 columns

dfs['Condition']
resourceType id clinicalStatus_coding_0_system clinicalStatus_coding_0_code verificationStatus_coding_0_system verificationStatus_coding_0_code code_coding_0_system code_coding_0_code code_coding_0_display code_text subject_reference encounter_reference onsetDateTime recordedDate abatementDateTime
0 Condition a5a38601-b6fe-46b4-a67e-cde9d5957dde http://terminology.hl7.org/CodeSystem/conditio... active http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 40055000 Chronic sinusitis (disorder) Chronic sinusitis (disorder) Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 Encounter/17b801ac-58e3-4f6b-8b48-8e33f3a36086 1985-06-18T17:30:49-04:00 1985-06-18T17:30:49-04:00 NaN
1 Condition 8f818ad4-c292-47e8-8d99-c4c54174b671 http://terminology.hl7.org/CodeSystem/conditio... active http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 162864005 Body mass index 30+ - obesity (finding) Body mass index 30+ - obesity (finding) Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 Encounter/0953dd44-90bb-4805-badd-169a761a6ab3 2005-01-19T16:30:49-05:00 2005-01-19T16:30:49-05:00 NaN
2 Condition 65d9d5f2-a772-4586-932f-df1f2ce1a863 http://terminology.hl7.org/CodeSystem/conditio... active http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 15777000 Prediabetes Prediabetes Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 Encounter/d4e1370a-a679-4570-a3dc-e4f7ac847512 2013-02-06T16:30:49-05:00 2013-02-06T16:30:49-05:00 NaN
3 Condition 77ac8342-6950-4302-a303-efba12e06785 http://terminology.hl7.org/CodeSystem/conditio... resolved http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 68496003 Polyp of colon Polyp of colon Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 Encounter/58ad433b-3707-4d40-9b63-2a803b4913bd 2015-01-14T16:30:49-05:00 2015-01-14T16:30:49-05:00 2017-05-03T17:30:49-04:00
4 Condition 6514ab0c-bc64-4e1b-aa61-b97d27d72bc7 http://terminology.hl7.org/CodeSystem/conditio... active http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 271737000 Anemia (disorder) Anemia (disorder) Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 Encounter/58ad433b-3707-4d40-9b63-2a803b4913bd 2015-01-14T16:30:49-05:00 2015-01-14T16:30:49-05:00 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
634 Condition ab051f6c-4298-407b-9315-2322ce913539 http://terminology.hl7.org/CodeSystem/conditio... active http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 162864005 Body mass index 30+ - obesity (finding) Body mass index 30+ - obesity (finding) Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 Encounter/c5ed8aed-2b7e-4630-bd1d-ac5090967edc 2014-11-22T15:43:42-05:00 2014-11-22T15:43:42-05:00 NaN
635 Condition 76c1f07a-f8f2-4705-aa80-5f7a25d7c651 http://terminology.hl7.org/CodeSystem/conditio... resolved http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 39848009 Whiplash injury to neck Whiplash injury to neck Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 Encounter/9c8b41dd-d6fd-4691-ae46-01b47992dd8d 2015-07-13T16:43:42-04:00 2015-07-13T16:43:42-04:00 2015-08-10T16:43:42-04:00
636 Condition b9a078eb-bb83-49ed-b4ed-633d1445356d http://terminology.hl7.org/CodeSystem/conditio... resolved http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 70704007 Sprain of wrist Sprain of wrist Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 Encounter/f044f05a-8433-4952-926d-dd8e2b4ee44e 2018-07-25T16:43:42-04:00 2018-07-25T16:43:42-04:00 2018-08-15T16:43:42-04:00
637 Condition 0fe427ce-7ea1-4409-8de1-3879f9dc56bb http://terminology.hl7.org/CodeSystem/conditio... resolved http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 444814009 Viral sinusitis (disorder) Viral sinusitis (disorder) Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 Encounter/9100e9aa-1206-403b-b2bf-b75ac23991bd 2018-09-26T16:43:42-04:00 2018-09-26T16:43:42-04:00 2018-10-17T16:43:42-04:00
638 Condition 88f3f41a-68ec-46dd-8d44-9178d3872220 http://terminology.hl7.org/CodeSystem/conditio... resolved http://terminology.hl7.org/CodeSystem/conditio... confirmed http://snomed.info/sct 72892002 Normal pregnancy Normal pregnancy Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 Encounter/2d975caf-e6bf-43c2-8778-ea293df1f255 2018-12-22T15:43:42-05:00 2018-12-22T15:43:42-05:00 2019-07-27T16:43:42-04:00

639 rows × 15 columns

Group export

§170.315(g)(10) Standardized API for patient and population services requires group-export as of December 2022.

This is therefore the FHIR Bulk Data endpoint you are likely to find in EHRs.

To use this endpoint, you will need the ID of the group of patients you want to export. In a production setting, this would typically be provided by the administrators of the EHR.

For the bulk-data.smarthealthit.org testing server, we can ask it for a list of groups via the FHIR API:

r = session.get(f'{server_url}/Group', headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})
r.json()
{'resourceType': 'Bundle',
 'id': 'e21c6a557591d81d35d3f3bae22e6b490c71ad36b1b75b163393ea102e47eae8',
 'meta': {'lastUpdated': '2023-06-13 01:26:34'},
 'type': 'searchset',
 'total': 8,
 'link': [{'relation': 'self',
   'url': 'https://bulk-data.smarthealthit.org/fhir/Group'}],
 'entry': [{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07',
   'resource': {'resourceType': 'Group',
    'id': '1f76e2b7-a222-4765-9097-a71b86e90d07',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': '1f76e2b7-a222-4765-9097-a71b86e90d07'}],
    'quantity': 25,
    'name': 'Health New England',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Health New England</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b',
   'resource': {'resourceType': 'Group',
    'id': '84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': '84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b'}],
    'quantity': 3,
    'name': 'Minuteman Health',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Minuteman Health</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/a1f090cb-ffd1-436d-a815-fb047d9a1903',
   'resource': {'resourceType': 'Group',
    'id': 'a1f090cb-ffd1-436d-a815-fb047d9a1903',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'a1f090cb-ffd1-436d-a815-fb047d9a1903'}],
    'quantity': 10,
    'name': 'BMC HealthNet',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">BMC HealthNet</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/a95907b4-0c41-462a-bfcf-cb822075eb39',
   'resource': {'resourceType': 'Group',
    'id': 'a95907b4-0c41-462a-bfcf-cb822075eb39',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'a95907b4-0c41-462a-bfcf-cb822075eb39'}],
    'quantity': 3,
    'name': 'Harvard Pilgrim Health Care',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Harvard Pilgrim Health Care</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7',
   'resource': {'resourceType': 'Group',
    'id': 'ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7'}],
    'quantity': 22,
    'name': 'Tufts Health Plan',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Tufts Health Plan</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/b058c5e7-209c-4162-9289-0ff703347c0f',
   'resource': {'resourceType': 'Group',
    'id': 'b058c5e7-209c-4162-9289-0ff703347c0f',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'b058c5e7-209c-4162-9289-0ff703347c0f'}],
    'quantity': 3,
    'name': 'Fallon Health',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Fallon Health</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/cf04e363-eef4-4653-9650-846bca43f357',
   'resource': {'resourceType': 'Group',
    'id': 'cf04e363-eef4-4653-9650-846bca43f357',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'cf04e363-eef4-4653-9650-846bca43f357'}],
    'quantity': 7,
    'name': 'Neighborhood Health Plan',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Neighborhood Health Plan</div>'},
    'type': 'person',
    'actual': True}},
  {'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/ff7dc35f-79e9-47a0-af22-475cf301a085',
   'resource': {'resourceType': 'Group',
    'id': 'ff7dc35f-79e9-47a0-af22-475cf301a085',
    'identifier': [{'system': 'https://bulk-data/db-id',
      'value': 'ff7dc35f-79e9-47a0-af22-475cf301a085'}],
    'quantity': 27,
    'name': 'Blue Cross Blue Shield',
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Blue Cross Blue Shield</div>'},
    'type': 'person',
    'actual': True}}]}

Let’s quickly pull this into a Pandas DataFrame to make it easier to read:

groups = pd.json_normalize(r.json()['entry'])[['resource.id', 'resource.name', 'resource.quantity']]
groups
resource.id resource.name resource.quantity
0 1f76e2b7-a222-4765-9097-a71b86e90d07 Health New England 25
1 84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b Minuteman Health 3
2 a1f090cb-ffd1-436d-a815-fb047d9a1903 BMC HealthNet 10
3 a95907b4-0c41-462a-bfcf-cb822075eb39 Harvard Pilgrim Health Care 3
4 ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7 Tufts Health Plan 22
5 b058c5e7-209c-4162-9289-0ff703347c0f Fallon Health 3
6 cf04e363-eef4-4653-9650-846bca43f357 Neighborhood Health Plan 7
7 ff7dc35f-79e9-47a0-af22-475cf301a085 Blue Cross Blue Shield 27

Now we can request the patients and associated data for a specific group:

group_id = groups.loc[0, 'resource.id']

fetcher = BulkDataFetcher(
    base_url=server_url, client_id=client_id, private_key=private_key, key_id=key_id, session=session,

    # Tell the BulkDataFetcher to request data from the specified group rather than all patients
    endpoint=f'Group/{group_id}'
)

# add a resource type of interest, with some FHIRPath field mappings
fetcher.add_resource_type('Patient', [
    ("id", "identifier[0].value"),
    ("gender", "gender"),
    ("date_of_birth", "birthDate"),
    ("marital_status", "maritalStatus.coding.first().code")
])

# add another resource type, with no FHIRPath mappings (load the entire resource)
fetcher.add_resource_type('Condition')

dfs = fetcher.get_dataframes()

dfs['Patient']
Fetching from 
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw
iOjB9/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07/$export?_type=Patient,Condition
id gender date_of_birth marital_status
0 fbfec681-d357-4b28-b1d2-5db6434c7846 female 1942-07-04 M
1 0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 female 1967-10-24 S
2 62e03ae7-079c-4eda-9b5a-29440d3a015a male 2013-05-25 S
3 7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 female 1959-07-23 M
4 84d4bafe-1891-4e6c-b8aa-38d2eafd8193 female 2000-09-12 S
5 4bc3ef6a-65c5-470d-8911-f26194b2a0e3 male 2004-08-20 S
6 3ca9f003-e6dd-4110-b4c2-12c056b880f4 female 1991-12-27 M
7 daf4e787-0ea5-45ff-a9a1-c68308e9f6a3 male 1995-06-21 S
8 1ad52ff0-428a-4048-aff8-f7196a2da649 female 1996-09-11 S
9 644d85af-aaf9-4068-ad23-1e55aedd5205 male 2003-09-12 S
10 687eb477-32ae-44ac-a0ef-2912623a14ff female 1960-10-18 M
11 70ac5078-22ef-471d-bed7-cb694775b4ba female 2001-08-25 S
12 7646ecba-4812-452e-88e7-6235f77dabb2 female 1997-11-20 S
13 69071541-e760-4d0c-bf8a-961a061cb0d5 male 1952-02-23 M
14 221fe1ec-a258-4fc4-8cc8-b7c960a8a0a9 female 2008-01-04 S
15 32e46528-35d1-4ed7-9aaa-09ae00f9681c male 2007-02-12 S
16 b20c7c80-49ac-4926-8b03-e9c69b40e1f5 male 1951-04-26 S
17 733abdda-2bfa-485f-9c83-ed9b206889b2 male 1972-02-27 M
18 55940999-fd98-4922-b9bc-a6bf0c1855ed male 1931-04-18 M
19 7ba8d35f-3f70-48b9-b711-104374136ac7 male 2002-05-10 S
20 ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 female 1993-02-09 S
21 8d3e1155-278a-4824-a7e0-fddb24c7c179 male 1991-10-27 S
22 8c9fea57-6ded-47b0-88c9-75518430b572 female 1962-08-01 M
23 d2524ab6-4db9-440d-b588-6dcfcab89270 male 1979-11-11 S
24 f98b23bf-4443-46d0-9eaf-563e767cf948 male 1966-02-07 M

A number of different FHIR resources are available from the test server:

Try modifying the request above to pull in resource types other than Patient and Condition. The links above go to the FHIR documentation for each resource type, which can help with constructing FHIRPaths.

# Try adding an additional resources
fetcher.add_resource_type('Observation')

dfs = fetcher.get_dataframes()

dfs['Observation']
Fetching from 
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw
iOjB9/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07/$export?_type=Patient,Condition,Observation
resourceType id status category_0_coding_0_system category_0_coding_0_code category_0_coding_0_display code_coding_0_system code_coding_0_code code_coding_0_display code_text ... component_1_valueQuantity_system component_1_valueQuantity_code valueCodeableConcept_coding_0_system valueCodeableConcept_coding_0_code valueCodeableConcept_coding_0_display valueCodeableConcept_text code_coding_1_system code_coding_1_code code_coding_1_display valueString
0 Observation 7a10d1ef-97f2-468b-b3ab-78dc99c65cf6 final http://terminology.hl7.org/CodeSystem/observat... vital-signs vital-signs http://loinc.org 8302-2 Body Height Body Height ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Observation 168c33d9-b4c8-4783-b97a-82fee9625f35 final http://terminology.hl7.org/CodeSystem/observat... vital-signs vital-signs http://loinc.org 72514-3 Pain severity - 0-10 verbal numeric rating [Sc... Pain severity - 0-10 verbal numeric rating [Sc... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Observation 7a6ee07e-72d0-4a74-896f-13c1b2e9b254 final http://terminology.hl7.org/CodeSystem/observat... vital-signs vital-signs http://loinc.org 29463-7 Body Weight Body Weight ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Observation fe4863c9-ac4e-439d-9d8f-834f1aebd3d8 final http://terminology.hl7.org/CodeSystem/observat... vital-signs vital-signs http://loinc.org 39156-5 Body Mass Index Body Mass Index ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Observation a7e89580-6be8-4b0a-a528-dfb91f2508bf final http://terminology.hl7.org/CodeSystem/observat... vital-signs vital-signs http://loinc.org 85354-9 Blood Pressure Blood Pressure ... http://unitsofmeasure.org mm[Hg] NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4024 Observation 6f4249dd-abeb-43dd-b388-407fff8ea4f8 final http://terminology.hl7.org/CodeSystem/observat... laboratory laboratory http://loinc.org 1920-8 Aspartate aminotransferase [Enzymatic activity... Aspartate aminotransferase [Enzymatic activity... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4025 Observation a04bc225-ad1c-4235-b792-1abde0cb5055 final http://terminology.hl7.org/CodeSystem/observat... laboratory laboratory http://loinc.org 2093-3 Total Cholesterol Total Cholesterol ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4026 Observation 1b26f91e-68d2-4630-a1e0-d2aba21ed4f6 final http://terminology.hl7.org/CodeSystem/observat... laboratory laboratory http://loinc.org 2571-8 Triglycerides Triglycerides ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4027 Observation e43ed8c1-0535-407a-abd0-41dd09be6d4a final http://terminology.hl7.org/CodeSystem/observat... laboratory laboratory http://loinc.org 18262-6 Low Density Lipoprotein Cholesterol Low Density Lipoprotein Cholesterol ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4028 Observation b8c6b918-95b6-43f9-9524-78371a931384 final http://terminology.hl7.org/CodeSystem/observat... laboratory laboratory http://loinc.org 2085-9 High Density Lipoprotein Cholesterol High Density Lipoprotein Cholesterol ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4029 rows × 42 columns

# Try filtering to just Observations of smoking status

# `fetcher.reprocess_dataframes()` does the same thing as `get_dataframes()`,
# but with FHIRPaths and without re-downloading everything
dfs = fetcher.reprocess_dataframes({
    'Patient': [
        ("id", "identifier[0].value"),
        ("gender", "gender"),
        ("date_of_birth", "birthDate"),
        ("marital_status", "maritalStatus.coding.first().code")
    ],
    'Observation': [
        ("id", "id"),
        ("patient", "subject.reference"),
        ("type", "code.coding.first().code"),
        ("type_display", "code.coding.first().display"),
        ("code", "valueCodeableConcept.coding.first().code"),
        ("code_display", "valueCodeableConcept.coding.first().display"),
    ]
})

with pd.option_context('display.max_rows', 100, 'display.min_rows', 100):
    display(dfs['Observation'][dfs['Observation']['type'] == '72166-2'])
id patient type type_display code code_display
15 51202a40-c62e-4727-bfef-b643869d0951 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
32 04cc91b0-4fd0-4244-9738-32c563116308 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
53 87cc17d0-3914-4e95-9344-b76e394dfae6 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
81 b10d3da1-249e-406d-90cb-9ac19314f823 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
98 a0e5b7d8-30a2-4952-ae7a-c0a8d49bc9f3 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
119 f843b2e7-6caf-4284-aa90-51c2d55c3c51 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
136 d070c5c5-2a0a-4fe9-ba57-0b7fde240f3b Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
153 64beef69-dd3c-46a1-b970-8a5dc95be927 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
185 b120c744-e1ec-49b9-b8b0-2a54455f410c Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
202 984b970a-b027-4778-a820-7f6c56761125 Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
213 17378ae3-f3b7-418b-b8e1-7af2856a5eac Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
221 ec100afb-6dad-47e9-8a58-2ea28c1a5fb6 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
233 bdf0d1b0-d405-4dbe-acc3-855d6c38d0d2 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
252 8b4c3ebb-0f3b-4a9f-aca6-894df17a652d Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
260 db972319-bfa5-4d88-a562-3e71e94bafc1 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
273 88515b19-a75f-486e-af88-96222bd4ca18 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
281 2da712dc-da92-4f89-a36e-3aa8fc714d73 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
289 219df89c-b801-4d3d-bfc0-7b270ad22542 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
312 b99b0288-8c94-4126-a7e0-d1c8717e51a3 Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
332 18e82af6-9bb3-4657-8ee6-b02bdb288476 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
341 45e97461-0051-4bf5-82df-a8c4a429f099 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
350 e84acd03-8ae3-4b82-95cf-25799c7c94b8 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
359 f3d3151e-9a5b-4227-9b28-89f9cb61ca90 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
368 c3364946-7957-4a88-8912-7dc8a9a1cfcd Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
377 cb400c7f-485e-4674-a3d8-4700543235ae Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
398 450e1f51-8765-431c-a795-40206ac5c9fa Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
425 4ab3823f-27c4-466c-ae57-16e914454522 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
434 23c9d6e1-71bf-4766-96df-65a3b0d25d05 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
445 187e50b4-ac94-48a1-b052-02b5549be600 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
456 a0e8bef3-8317-4930-858d-8dff3830ff13 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
468 30c32d11-4e9a-4feb-b352-2196792bb2cf Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
479 0145b24c-12c5-4ef2-a437-3efa9006f498 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
488 2b444657-c019-4434-a9d4-3e0f2f7cdd2f Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
509 73b8dc8c-91f8-41dd-a6ba-981c6ed12316 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
518 c83226df-6f61-4849-bcce-1eec0f56be74 Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
534 683a4889-8ee5-4f5b-befb-35ec22b13107 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
562 2735a667-d2cd-4491-9a32-0758ce0f48e9 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
583 f86585e1-ec12-4125-a254-2533ea3b9bba Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
600 495b7f39-53b4-4d98-884d-fe14e464a13a Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
637 12ea697f-c2b3-4539-80c7-2ed09c84da2b Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
658 3a4f53d6-3c82-4efb-b891-d6a866949e58 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
675 fb5ed249-16e5-4379-85e7-6b8300ee9548 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
723 ee6a13b1-b6d1-49ab-8eff-128dc7f40daf Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
760 ec72955c-b5ba-40b8-a67e-13b137d95cb7 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
822 71d62e29-789d-4417-95fa-b71292d2e685 Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
859 6d191cde-5731-41e5-9197-5827fad6608c Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
871 566c6a71-d7ee-49fa-88a4-2c5e6e1d1476 Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
880 55ffbd45-6042-4240-8f32-4cd10c1c6d48 Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
900 abd678ea-30d0-48b1-9f8e-4b96c952e55e Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
910 c00b5394-c828-46d3-b81f-498b768c5c81 Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
... ... ... ... ... ... ...
3094 1448ef79-0efb-49fd-a32e-8e4501d63654 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3103 70b64209-a5f0-4d9b-9b92-ca8d4e37ad20 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3128 518b9a78-3ed3-401e-8792-b1f5653da244 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3137 4a491920-5523-4e12-a178-ad7692aae555 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3146 eec65c1b-e59a-4e65-a311-2c0cd097b078 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3166 5559cf2a-1386-4a48-b889-29798870a3f4 Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3190 62abe2b1-4b46-41e4-9784-05509fe2439b Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3199 9cc9a329-78d7-4b4c-9734-624576028f5f Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3208 aab937d5-dc7c-44f2-9328-c58039baeedb Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3228 c8321dcc-cba2-454e-a862-b692bf75455e Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3236 df310c44-a40d-4596-a16d-22ae9454e2d7 Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3245 ebb48186-7b0c-4ea2-a4ee-1bc77bd40926 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3253 837fd4cb-10c3-4d89-b2c1-23be8a8572b6 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3262 e9eabd4c-3d45-45e8-b21e-0a479cb0530f Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3282 6a594967-b815-4650-82f6-eb404bfc7aaf Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3299 fe884b8b-04d9-4c3b-a710-f77d89d87bef Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3317 22d56cc5-7a0e-49cf-8483-97de08415e1c Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3334 6bcef563-1082-4398-85d1-86a8f6603141 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3351 6d9a0171-6e80-464f-8022-5b740af8a932 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3368 e5447fff-dee3-4e8a-9756-e11304fa7f82 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3396 df8ba6a0-a4e8-4983-b1fb-85a6e68d5d39 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3424 1c7ca98b-18a6-401a-b5f9-218d21455658 Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3441 71029c58-cca3-45d6-8f2e-cfa1b225883c Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3458 756e618d-9983-4da5-9023-6103c1d4f3c0 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3479 f70de159-6f5c-40fb-a3fd-90c722a51aa6 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3496 4ea513ae-7aaf-4e61-bfa1-4e82333e6788 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3524 21baef36-0caf-452e-a049-d624974a0739 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3545 74b3fc01-da0c-4043-b630-38bb679fbd42 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3562 2f7f598b-fe86-44c4-88c1-f4951b5f9560 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3580 938dcefa-4ffb-4757-b13a-7cd715bcd599 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3598 11f8c8fc-ecee-4ec3-9201-8c192f3f8449 Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3619 5d0dabae-f225-4ac7-a9e5-ae1103720cde Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3647 437d29cc-02b8-48a3-bc00-4e895bcdddae Patient/8c9fea57-6ded-47b0-88c9-75518430b572 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3667 0d4d8892-536d-4e3d-adea-690b9f6582ed Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3675 a563d054-0081-4432-8977-a5cde30e13de Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3687 0f7039d9-1ef2-4aac-8617-96e33f827a10 Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3695 6e0600dc-f43a-48f8-87f6-45a0983c6d97 Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3703 be1d1942-ae59-42db-914a-04cbc61cdcf3 Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3726 82de3db7-2921-453d-bbc5-8c56d70d630f Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3734 07f06643-6409-404c-89f6-fe1eb0e8c8cc Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3742 5ec5a8fd-43e8-4835-bef4-c0a429745a59 Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3754 2151a281-9eac-4f98-9a1d-3b30b169e2f2 Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3762 889b8074-f24d-4b7d-aba3-474fed898bda Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 72166-2 Tobacco smoking status NHIS 266919005 Never smoker
3801 f596e621-6fe7-4acf-a0a7-2a68a2a77b35 Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3833 3e0ed4d8-7e00-42e0-8a7c-d9d6dd57e05d Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3881 4a32ee10-0c5c-4f1c-820a-9155ee04d060 Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3909 8e5f48c0-651c-4f02-9a75-8268fa1c0b1b Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3952 3f033821-1193-4ecf-95eb-9cfc2fa21cc0 Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
3980 5368a223-6972-410f-a253-6be399479303 Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker
4008 6d9ff554-d1cf-48bb-9487-7d132c3575af Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 72166-2 Tobacco smoking status NHIS 8517006 Former smoker

258 rows × 6 columns

Creating FHIRPaths

It may be helpful to use an online tool like https://hl7.github.io/fhirpath.js/ to assist with creating FHIRPaths for filtering the FHIR resources down for creating DataFrames. (Note that you should not use tools like this with identified patient data.)

We have a convenience method to get an example resource in JSON format from the fetcher object:

print(json.dumps(fetcher.get_example_resource('Observation'), indent=4))
{
    "resourceType": "Observation",
    "id": "7a10d1ef-97f2-468b-b3ab-78dc99c65cf6",
    "status": "final",
    "category": [
        {
            "coding": [
                {
                    "system": "http://terminology.hl7.org/CodeSystem/observation-category",
                    "code": "vital-signs",
                    "display": "vital-signs"
                }
            ]
        }
    ],
    "code": {
        "coding": [
            {
                "system": "http://loinc.org",
                "code": "8302-2",
                "display": "Body Height"
            }
        ],
        "text": "Body Height"
    },
    "subject": {
        "reference": "Patient/fbfec681-d357-4b28-b1d2-5db6434c7846"
    },
    "encounter": {
        "reference": "Encounter/a65fee02-b183-4ae5-a9e2-5edf89a6f327"
    },
    "effectiveDateTime": "2010-11-20T02:52:44-05:00",
    "issued": "2010-11-20T02:52:44.074-05:00",
    "valueQuantity": {
        "value": 162.4,
        "unit": "cm",
        "system": "http://unitsofmeasure.org",
        "code": "cm"
    }
}

This can be copied and pasted into https://hl7.github.io/fhirpath.js/ to experiment with FHIRPaths. Note that the JavaScript library used on this testing website is not the same as the Python library used in this notebook, so there may be some implementation differences.

Testing with Synthea data

Having test data is very helpful when developing code that uses FHIR Bulk Data. The test data from https://bulk-data.smarthealthit.org may not have all the data elements you need for a specific research use case. Synthea can be used for generating customized synthetic data in FHIR format. Below we’ll look at how to load .ndjson from Synthea into this notebook and use reprocess_dataframes() with FHIRPaths to convert into Pandas DataFrames.

First, we’ll create a short class to mimic the functionality of BulkDataFetcher but with loading the .ndjson directly from disk rather than via a bulk data export.

class SyntheaDataFetcher:
    def __init__(self, ndjson_file_path):
        self.resources_by_type = {}

        num_lines = sum(1 for line in open(ndjson_file_path,'r'))
        with open(ndjson_file_path, 'r') as file:
            for line in tqdm(file, total=num_lines):
                json_obj = json.loads(line)
                this_resource_type = json_obj['resourceType']
                if this_resource_type not in self.resources_by_type:
                    self.resources_by_type[this_resource_type] = []
                self.resources_by_type[this_resource_type].append(json_obj)

        print("Resources available: ")
        print('\n'.join(['- '+ x for x in self.resources_by_type.keys()]))

    def get_example_resource(self, resource_type: str, resource_id: Optional[str] = None):
        if self.resources_by_type is None:
            print("You need to run get_dataframes() first")
            return None

        if resource_type not in self.resources_by_type:
            print(f"{resource_type} not available. Try one of these: {', '.join(self.resources_by_type.keys())}")
            return None

        if resource_id is None:
            return self.resources_by_type[resource_type][0]

        resource = [r for r in self.resources_by_type[resource_type] if r['id'] == resource_id]

        if len(resource) > 0:
            return resource[0]

        print(f"No {resource_type} with id={resource_id} was found.")
        return None

    def reprocess_dataframes(self, user_fhir_paths):
        return BulkDataFetcher._reprocess_dataframes(self.resources_by_type, user_fhir_paths)


# Load in 40 patients of Synthea data.
# The original data come from <https://synthea.mitre.org/downloads> > 1K Sample Synthetic Patient Records, FHIR R4
synthea_fetcher = SyntheaDataFetcher('synthea_100.ndjson')
Resources available: 
- Patient
- Organization
- Practitioner
- Encounter
- Condition
- Device
- Claim
- ExplanationOfBenefit
- CareTeam
- Goal
- CarePlan
- Observation
- Immunization
- DiagnosticReport
- Procedure
- MedicationRequest
- ImagingStudy
- AllergyIntolerance
- MedicationAdministration

Here is how to apply FHIRPaths to filter the Synthea data:

dfs = synthea_fetcher.reprocess_dataframes({'Patient': [('id', 'id')]})

dfs['Patient']
id
0 5cbc121b-cd71-4428-b8b7-31e53eba8184
1 adccf2c3-9dc4-4067-ba23-98982c4875da
2 31191928-6acb-4d73-931c-e601cc3a13fa
3 67816396-e325-496d-a6ec-c047756b7ce4
4 b426b062-8273-4b93-a907-de3176c0567d
... ...
95 ae4c5b55-c704-4406-b353-285f9166a489
96 edb1ebc5-d629-4c43-acf5-b8d1c38d9bd2
97 2d75e3a4-f0f6-45dd-8b57-75fb2f303c9e
98 ea95f498-7929-4d50-be55-9bf7baee3a8d
99 57ca2c16-7008-41e5-b338-4758b2fc46f0

100 rows × 1 columns

You can also get a sample resource to look at the raw JSON:

print(synthea_fetcher.get_example_resource('Patient'))
{
    'resourceType': 'Patient',
    'id': '5cbc121b-cd71-4428-b8b7-31e53eba8184',
    'text': {
        'status': 'generated',
        'div': '<div xmlns="http://www.w3.org/1999/xhtml">Generated by <a 
href="https://github.com/synthetichealth/synthea">Synthea</a>.Version identifier: v2.4.0-404-ge7ce2295\n .   Person
seed: 6457100290386878904  Population seed: 0</div>'
    },
    'extension': [
        {
            'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race',
            'extension': [
                {
                    'url': 'ombCategory',
                    'valueCoding': {
                        'system': 'urn:oid:2.16.840.1.113883.6.238',
                        'code': '2106-3',
                        'display': 'White'
                    }
                },
                {'url': 'text', 'valueString': 'White'}
            ]
        },
        {
            'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity',
            'extension': [
                {
                    'url': 'ombCategory',
                    'valueCoding': {
                        'system': 'urn:oid:2.16.840.1.113883.6.238',
                        'code': '2186-5',
                        'display': 'Not Hispanic or Latino'
                    }
                },
                {'url': 'text', 'valueString': 'Not Hispanic or Latino'}
            ]
        },
        {
            'url': 'http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName',
            'valueString': 'Deadra347 Borer986'
        },
        {'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex', 'valueCode': 'M'},
        {
            'url': 'http://hl7.org/fhir/StructureDefinition/patient-birthPlace',
            'valueAddress': {'city': 'Billerica', 'state': 'Massachusetts', 'country': 'US'}
        },
        {
            'url': 'http://synthetichealth.github.io/synthea/disability-adjusted-life-years',
            'valueDecimal': 14.062655945052095
        },
        {
            'url': 'http://synthetichealth.github.io/synthea/quality-adjusted-life-years',
            'valueDecimal': 58.93734405494791
        }
    ],
    'identifier': [
        {'system': 'https://github.com/synthetichealth/synthea', 'value': '2fa15bc7-8866-461a-9000-f739e425860a'},
        {
            'type': {
                'coding': [
                    {
                        'system': 'http://terminology.hl7.org/CodeSystem/v2-0203',
                        'code': 'MR',
                        'display': 'Medical Record Number'
                    }
                ],
                'text': 'Medical Record Number'
            },
            'system': 'http://hospital.smarthealthit.org',
            'value': '2fa15bc7-8866-461a-9000-f739e425860a'
        },
        {
            'type': {
                'coding': [
                    {
                        'system': 'http://terminology.hl7.org/CodeSystem/v2-0203',
                        'code': 'SS',
                        'display': 'Social Security Number'
                    }
                ],
                'text': 'Social Security Number'
            },
            'system': 'http://hl7.org/fhir/sid/us-ssn',
            'value': '999-93-7537'
        },
        {
            'type': {
                'coding': [
                    {
                        'system': 'http://terminology.hl7.org/CodeSystem/v2-0203',
                        'code': 'DL',
                        'display': "Driver's License"
                    }
                ],
                'text': "Driver's License"
            },
            'system': 'urn:oid:2.16.840.1.113883.4.3.25',
            'value': 'S99948707'
        },
        {
            'type': {
                'coding': [
                    {
                        'system': 'http://terminology.hl7.org/CodeSystem/v2-0203',
                        'code': 'PPN',
                        'display': 'Passport Number'
                    }
                ],
                'text': 'Passport Number'
            },
            'system': 'http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber',
            'value': 'X14078167X'
        }
    ],
    'name': [{'use': 'official', 'family': 'Brekke496', 'given': ['Aaron697'], 'prefix': ['Mr.']}],
    'telecom': [{'system': 'phone', 'value': '555-677-3119', 'use': 'home'}],
    'gender': 'male',
    'birthDate': '1945-12-10',
    'address': [
        {
            'extension': [
                {
                    'url': 'http://hl7.org/fhir/StructureDefinition/geolocation',
                    'extension': [
                        {'url': 'latitude', 'valueDecimal': 41.93879298871088},
                        {'url': 'longitude', 'valueDecimal': -71.06682353144593}
                    ]
                }
            ],
            'line': ['894 Brakus Bypass'],
            'city': 'Taunton',
            'state': 'Massachusetts',
            'postalCode': '02718',
            'country': 'US'
        }
    ],
    'maritalStatus': {
        'coding': [
            {'system': 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus', 'code': 'S', 'display': 'S'}
        ],
        'text': 'S'
    },
    'multipleBirthBoolean': False,
    'communication': [
        {
            'language': {
                'coding': [{'system': 'urn:ietf:bcp:47', 'code': 'en-US', 'display': 'English'}],
                'text': 'English'
            }
        }
    ]
}

Try it yourself

Using FHIRPath, create the necessary dataframes to answer the following questions:

  1. How many patients in the dataset have ever received a flu vaccine?
  2. What are the five most common conditions that patients have been diagnosed with? (Use only the first diagnosis of a given condition for each patient.)
  3. What is the most common medication (in MedicationRequest), and what are the top 5 encounter types associated with these medications?

Remember that you can look at the FHIR resource documentation to see what data elements are in each resource. You can also use synthea_fetcher.get_example_resource('ResourceTypeHere') with https://hl7.github.io/fhirpath.js/ for testing out FHIRPaths if needed.

Summary

Through this exercise we built a reusable tool to connect to a FHIR server with Bulk Data capabilities, export a set of resource types, and convert that data into DataFrames for analysis.