# The default style for rendering JSON parsed as Python dicts isn't the best.
# Use this import and call `print(json)` when we want a cleaner view.
from rich import print
# Status bars for long-running cels
from tqdm.notebook import trange, tqdm
FHIR for Research Workshop - Bulk Data
Introduction
Learning Objectives and Key Concepts
The goal of this workshop is to connect to the SMART Bulk Data Server and fetch a set of sample patient data.
In this exercise, you will:
- Connect to an authorization server using a provided key, and retrieve an access token
- Make a Bulk Data Export Request with that access token
- Download the exported Bulk Data
- Convert the downloaded data into DataFrames
While libraries like FHIR-PYrate allow you to fetch data from a server and parse it directly into a DataFrame, these libraries generally do not support FHIR Bulk Data. This workshop will step through the process of building up a tool to fetch Bulk Data and convert it into DataFrames.
This notebook is best experienced interactively. If the notebook already has output in it, you may clear that prior to starting via the menu: Cell -> All Output -> Clear.
Setup
If you are not using a JupyterHub instance with dependencies already installed, you will need to:
- Clone this repository
- Install dependencies with
pip install -r requirements.txt
- Run
jupyter notebook workshops/fhir-bulk-data
This should open the Jupyter environment in your browser window. You should see notebook.ipynb
listed in the interface. Open this notebook in Jupyter, and you should be able to run the code.
Background
The Bulk Data Access standard enables researchers to retrieve large volumes of data from a patient population in an EHR. The Bulk Data Access standard is part of the SMART ecosystem, and SMART on FHIR can be used to authenticate and authorize applications that retrieve bulk data automatically
Clients of FHIR Bulk Data servers use SMART Backend Authorization to connect to the server. With SMART Backend Authorization, registered clients make a signed request to a token endpoint to receive a Bearer token, which they use for subsequent calls to the FHIR server.
Client registration often happens manually as a separate one-time event. The SMART Backend Authorization specification does not define an API for registration.
For this workshop, we connect to the SMART Bulk Data Server (https://bulk-data.smarthealthit.org). This is a developer tool provided by SMART Health IT to facilitate development with Bulk Data Access. This test server allows clients to “register” on the launch page by providing either a URL for a JSON Web Key Set(JWKS) or a raw JWKS. In this case, “registration” is not stored on the server. Instead, the FHIR Server URL contains the “registration” information stored as state in the URL and clientID. Production servers will usually have a more standard registration process rather than taking this approach.
For convenience, the SMART Bulk Data Server launch page allows users to generate a one-off JWKS to use for testing. For production usage, clients must generate their own certificates and JWKS and keep the private key private. In this workshop, we will use a JWKS generated by the launch page.
IMPORTANT: this workshop is not meant to be a formal documentation of the specification, and largely skips error handling and stays on the “happy path” for brevity and readability. We strongly recommend reviewing the specifications and adding error handling before using any of this code in a production environment.
Getting our Access Token
The first step in obtaining data from a FHIR server that supports Bulk Data Access is to obtain an access token. That access token identifies and authorizes the client on requests made to the FHIR resource server.
Obtaining an access token is itself a two-step process: 1. Make a discovery request to the FHIR resource server to get the address of the authorization server. 2. Post a token request, signed by the client’s private key, to the authorization server
To keep the focus of this workshop on the Bulk Data process rather than the details of generating keys, we will use a JWKS pre-generated by the SMART Bulk Data server launch page.
For reference, the steps followed to generate the keys used here were:
- Visit the SMART Bulk Data Server launch page
- In the upper left, click the
JWKS
button for Authentication - Click the
Generate
button and chooseGenerate RS384
- Choose
R4
for the FHIR Version - The associated text box now contains a JWKS with both a public and private key, and the Launch Configuration contains a FHIR Server URL and Client ID
- Convert the private key from the JWKS to “PEM” format so it can be used by Python (this is not easy to do natively in Python, so we have done it with JavaScript out of band)
Let’s start by defining our credentials. In practice, real credentials must always be stored and loaded securely, but for simplicity in this workshop we will define them as local variables.
= 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InJlZ2lzdHJhdGlvbi10b2tlbiJ9.eyJqd2tzIjp7ImtleXMiOlt7Imt0eSI6IlJTQSIsImFsZyI6IlJTMzg0IiwibiI6IngzMDc2RTJNaUpMR3JPbXJXRjZXSWZ1RjFSZDBlTjBSdEhUSVRuMlNGVWhMYTFQWE5Ia0xBR2xSSmtJWk1QMUk5SEhxdTRERy02d2JraFMweU9GbEZhZE1iaGgzcHkySHoybDctRmg1M3Y3bmpwb3dxUGV2eEpqMlpEQU5BanFWeHRLOGdvMm1BZmZFSnJ2ZkVHbm5oUGkzdGE1U2U5UTBkS29la2hJRVRCaVJTa0ozN0pobEZGSDh3S2hFLXVwaXBQU3VycTBrQ0JkNlNaS3NOVHpHNzJmLVJoNENiREZWTVdfRm5zcTh5LWRJMTdMSDJZcHBBLWc0eGlUZnMwMGZOUG9FUEdoWFU2bHFKMHMwclp4Um9zYnVuV0NTYi1UaEtWV0RyeUFudE83S3dWN1BxVG1NMmVrVS1yenZFaWprVjZfUUlnVTJxRTd6X1k1N1l4aW8zUSIsImUiOiJBUUFCIiwia2V5X29wcyI6WyJ2ZXJpZnkiXSwiZXh0Ijp0cnVlLCJraWQiOiI0ZDc3OTJjZTQyMDU0ZDVkZjhkZDg1ZjhiNTI3ZGQ4OCJ9LHsia3R5IjoiUlNBIiwiYWxnIjoiUlMzODQiLCJuIjoieDMwNzZFMk1pSkxHck9tcldGNldJZnVGMVJkMGVOMFJ0SFRJVG4yU0ZVaExhMVBYTkhrTEFHbFJKa0laTVAxSTlISHF1NERHLTZ3YmtoUzB5T0ZsRmFkTWJoaDNweTJIejJsNy1GaDUzdjduanBvd3FQZXZ4SmoyWkRBTkFqcVZ4dEs4Z28ybUFmZkVKcnZmRUdubmhQaTN0YTVTZTlRMGRLb2VraElFVEJpUlNrSjM3SmhsRkZIOHdLaEUtdXBpcFBTdXJxMGtDQmQ2U1pLc05Uekc3MmYtUmg0Q2JERlZNV19GbnNxOHktZEkxN0xIMllwcEEtZzR4aVRmczAwZk5Qb0VQR2hYVTZscUowczByWnhSb3NidW5XQ1NiLVRoS1ZXRHJ5QW50TzdLd1Y3UHFUbU0yZWtVLXJ6dkVpamtWNl9RSWdVMnFFN3pfWTU3WXhpbzNRIiwiZSI6IkFRQUIiLCJkIjoiUnptQWRTMlMtb1FsS1VGNHF1R0Npdm1KekE1R3lJeHRzTmR0V1JEZVluamdiSjZQbksyRzd3dXJMSlMyOTlYSEFYZld6a0ZwU2h3bDc5OHl1UEk0ckNXQ1ZXQ29fLWh5ci14Q2xlWEpCWVJQV292VXljODlVMTBsdzVtZ1cyWmRhWkotT2NLblBkYWZreERLME1wdkhmdkxZN09zd1lkX2Z4UHFQRTd3ZDlaQU5XLUIyWmNURUVmd2taNWdlcmtDdnFHQ1lEUTdVcVJqR3k1dWRjTkRiQ01ITFdGaEZZMTVqMDVMMFpJV0RwUDY2cmN6UWZEdnduR0pIbWxJbnJMbTl5WkowUTNkVlpHSmo2Y2dMeWI4WHhkNHpWRjZGSy1NX2VKbnFzZFRveHRPMDNUOVotSWlrN1BfbFBheWRvMWRycXRZdUxmZXpvU1lnUGp0V0NnV0JRIiwicCI6IjZwNlV5aGZiQ0JjQlEzcGttMHZEb1lqSDZsc1FCeS1PTzlEYlpfZnFfSHpzZl96UWhENDdua0dZZngxbGVTUFlQU0ZSeDlRTUR3cTlvYWxjYmEwNmE3QTVmMUxQNVpaRnNvSDVCTElHTUcxNmhDbW1mTEdRMURkZ3pMb2s3Q3RldDRnNGhUTlpseFZOYV9uYVNmZGJSdmQycF8zNTM1RGpaOXoyMEpSNllDYyIsInEiOiIyYXNhQ0RCTmY3NTQ1ajdOcXI2TTZiUW8wVGZEWGNlb2FxcGVtNGhpNE1pYUtBOEcydVFvdXNTOGcyUTlZOFZiZmxjX3I2WmxPVjIxSmJhYW5WN253MDRxbVpqMG5Xdkk0a19yX2lKWTVuSDNUMHk0Y0lGV21tLUhPY1dzazJXWl9QQ1NSc1piOU1qOUs4UXh6b1h5WEo0ck9aLUw4OTNZbDZ5bVdKa2xqVnMiLCJkcCI6Ik9LeWI5b0Z5dUc2T01KV2xMZHBNWkgzZEJPQ0FhNnZ5S01MWDdUSjNBZ3pQT0UtQ3N4OHhXWll3MXl2cnNpcVZkcGJRNFh0NGVqMjI5eEVwTVpreHpvZWdMQUItRmRDSl80Zmo5bDFtbjFZaXpVQWVabXFpT0pFMEFlQkpRUDlzX3RxYUJKc1YzaWdZTHFnSk1lcmRrclAtWnJBMEp1d2g4cG51eVEzRXplcyIsImRxIjoib2I2R0FvMjZHUEcxcnduLUZDR3lYanMwbFhzRlhwdHRaNDJmN1owa05IcDhLc1kzeHRJQl9mOFJRZVZyeE1hem5TZENPTWpCc1NZVDVLbFRMUnVIeHRZX3k1RWdQQllLMlRpZ1dXQzJoTTh0QWEwMTVNd0hTWTBVZ19hQ3JhaXpDNFRNZlhFS2hkUVFaTVJPYW5PWVRBQndpRW9wV2hhQXl2eE5ROHJSWDc4IiwicWkiOiJLSjhJU0RKaHVyUmEyTVRHdG4zWjR3NU9ob3o2N29OcE10MG1TakxGUEt0QjFWbjRaZ3VkTUxfWTZ4V2lWTnBOR1hQa3hoMEJjRmNKakNKcC0yeUZLV0d4Si14M2JMWVllbkVUaGRFSGRRR0xuUUszMHlEdHFTY2NDUVY5U2xGc281NUdnUmxhODNaY2NBZTdBMXBWN2sxRGE4dFVFNkE4TXNlQ1ZXamRLbFUiLCJrZXlfb3BzIjpbInNpZ24iXSwiZXh0Ijp0cnVlLCJraWQiOiI0ZDc3OTJjZTQyMDU0ZDVkZjhkZDg1ZjhiNTI3ZGQ4OCJ9XX0sImFjY2Vzc1Rva2Vuc0V4cGlyZUluIjoxNSwiaWF0IjoxNjg2NjUyNzM4fQ.j1urst068-21CxiH0Nqml7XoE9v6hWJ_vfqAK4W22vg'
client_id
# Don't worry! This is not anybody's real private key. It was generated specifically and only for this exercise.
= """-----BEGIN RSA PRIVATE KEY-----
private_key MIIEowIBAAKCAQEAx3076E2MiJLGrOmrWF6WIfuF1Rd0eN0RtHTITn2SFUhLa1PX
NHkLAGlRJkIZMP1I9HHqu4DG+6wbkhS0yOFlFadMbhh3py2Hz2l7+Fh53v7njpow
qPevxJj2ZDANAjqVxtK8go2mAffEJrvfEGnnhPi3ta5Se9Q0dKoekhIETBiRSkJ3
7JhlFFH8wKhE+upipPSurq0kCBd6SZKsNTzG72f+Rh4CbDFVMW/Fnsq8y+dI17LH
2YppA+g4xiTfs00fNPoEPGhXU6lqJ0s0rZxRosbunWCSb+ThKVWDryAntO7KwV7P
qTmM2ekU+rzvEijkV6/QIgU2qE7z/Y57Yxio3QIDAQABAoIBAEc5gHUtkvqEJSlB
eKrhgor5icwORsiMbbDXbVkQ3mJ44Gyej5ythu8LqyyUtvfVxwF31s5BaUocJe/f
MrjyOKwlglVgqP/ocq/sQpXlyQWET1qL1MnPPVNdJcOZoFtmXWmSfjnCpz3Wn5MQ
ytDKbx37y2OzrMGHf38T6jxO8HfWQDVvgdmXExBH8JGeYHq5Ar6hgmA0O1KkYxsu
bnXDQ2wjBy1hYRWNeY9OS9GSFg6T+uq3M0Hw78JxiR5pSJ6y5vcmSdEN3VWRiY+n
IC8m/F8XeM1RehSvjP3iZ6rHU6MbTtN0/WfiIpOz/5T2snaNXa6rWLi33s6EmID4
7VgoFgUCgYEA6p6UyhfbCBcBQ3pkm0vDoYjH6lsQBy+OO9DbZ/fq/Hzsf/zQhD47
nkGYfx1leSPYPSFRx9QMDwq9oalcba06a7A5f1LP5ZZFsoH5BLIGMG16hCmmfLGQ
1DdgzLok7Ctet4g4hTNZlxVNa/naSfdbRvd2p/3535DjZ9z20JR6YCcCgYEA2asa
CDBNf7545j7Nqr6M6bQo0TfDXceoaqpem4hi4MiaKA8G2uQousS8g2Q9Y8Vbflc/
r6ZlOV21JbaanV7nw04qmZj0nWvI4k/r/iJY5nH3T0y4cIFWmm+HOcWsk2WZ/PCS
RsZb9Mj9K8QxzoXyXJ4rOZ+L893Yl6ymWJkljVsCgYA4rJv2gXK4bo4wlaUt2kxk
fd0E4IBrq/IowtftMncCDM84T4KzHzFZljDXK+uyKpV2ltDhe3h6Pbb3ESkxmTHO
h6AsAH4V0In/h+P2XWafViLNQB5maqI4kTQB4ElA/2z+2poEmxXeKBguqAkx6t2S
s/5msDQm7CHyme7JDcTN6wKBgQChvoYCjboY8bWvCf4UIbJeOzSVewVem21njZ/t
nSQ0enwqxjfG0gH9/xFB5WvExrOdJ0I4yMGxJhPkqVMtG4fG1j/LkSA8FgrZOKBZ
YLaEzy0BrTXkzAdJjRSD9oKtqLMLhMx9cQqF1BBkxE5qc5hMAHCISilaFoDK/E1D
ytFfvwKBgCifCEgyYbq0WtjExrZ92eMOToaM+u6DaTLdJkoyxTyrQdVZ+GYLnTC/
2OsVolTaTRlz5MYdAXBXCYwiaftshSlhsSfsd2y2GHpxE4XRB3UBi50Ct9Mg7akn
HAkFfUpRbKOeRoEZWvN2XHAHuwNaVe5NQ2vLVBOgPDLHglVo3SpV
-----END RSA PRIVATE KEY-----"""
# note key id is the "kid" field from the JWKS -- it's same for both values of `keys`
= "4d7792ce42054d5df8dd85f8b527dd88"
key_id
= 'https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWwiOjB9/fhir' server_url
We will use the Requests library for making all HTTP requests, and use a Session
, in case we need to persist common settings such as proxy or SSL configuration.
import requests
= requests.Session()
session
# Optional: Turn off SSL verification. Useful when dealing with a corporate proxy with self-signed certificates.
from urllib3.exceptions import InsecureRequestWarning
=InsecureRequestWarning)
requests.packages.urllib3.disable_warnings(category= False session.verify
Let’s start by confirming we can hit the server via the /metadata
endpoint. When connecting to a server for the first time it is generally a good idea to review the metadata to see what the server supports, and that it matches your expectations. In this case, expect to see the name “SMART Sample Bulk Data Server”, and references to “export” operations.
= session.get(f'{server_url}/metadata')
r = r.json()
metadata
print(metadata)
{ 'resourceType': 'CapabilityStatement', 'status': 'active', 'date': '2023-06-13T01:26:34+00:00', 'publisher': "Boston Children's Hospital", 'kind': 'instance', 'instantiates': ['http://hl7.org/fhir/uv/bulkdata/CapabilityStatement/bulk-data'], 'software': {'name': 'SMART Sample Bulk Data Server', 'version': '2.1.1'}, 'implementation': {'description': 'SMART Sample Bulk Data Server'}, 'fhirVersion': '4.0.1', 'acceptUnknown': 'extensions', 'format': ['json'], 'rest': [ { 'mode': 'server', 'security': { 'extension': [ { 'url': 'http://fhir-registry.smarthealthit.org/StructureDefinition/oauth-uris', 'extension': [ {'url': 'token', 'valueUri': 'https://bulk-data.smarthealthit.org/auth/token'}, {'url': 'register', 'valueUri': 'https://bulk-data.smarthealthit.org/auth/register'} ] } ], 'service': [ { 'coding': [ { 'system': 'http://hl7.org/fhir/restful-security-service', 'code': 'SMART-on-FHIR', 'display': 'SMART-on-FHIR' } ], 'text': 'OAuth2 using SMART-on-FHIR profile (see http://docs.smarthealthit.org)' } ] }, 'resource': [ { 'type': 'Patient', 'operation': [ { 'extension': [ { 'url': 'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation', 'valueCode': 'SHOULD' } ], 'name': 'patient-export', 'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/patient-export' } ] }, { 'type': 'Group', 'operation': [ { 'extension': [ { 'url': 'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation', 'valueCode': 'SHOULD' } ], 'name': 'group-export', 'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/group-export' } ] }, { 'type': 'OperationDefinition', 'profile': {'reference': 'http://hl7.org/fhir/Profile/OperationDefinition'}, 'interaction': [{'code': 'read'}], 'searchParam': [] } ], 'operation': [ {'name': 'get-resource-counts', 'definition': 'OperationDefinition/-s-get-resource-counts'}, { 'extension': [ { 'url': 'http://hl7.org/fhir/StructureDefinition/capabilitystatement-expectation', 'valueCode': 'SHOULD' } ], 'name': 'export', 'definition': 'http://hl7.org/fhir/uv/bulkdata/OperationDefinition/export' } ] } ] }
The SMART Backend Authorization specification defines that the token endpoint will be published as part of the FHIR resource server’s SMART metadata, at .well-known/smart-configuration
. Let’s fetch that endpoint and review the contents.
= session.get(f'{server_url}/.well-known/smart-configuration')
r = r.json()
smart_config
print(smart_config)
{ 'token_endpoint': 'https://bulk-data.smarthealthit.org/auth/token', 'registration_endpoint': 'https://bulk-data.smarthealthit.org/auth/register', 'token_endpoint_auth_methods_supported': ['private_key_jwt'], 'token_endpoint_auth_signing_alg_values_supported': [ 'HS256', 'HS384', 'HS512', 'RS256', 'RS384', 'RS512', 'ES256', 'ES384', 'ES512', 'PS256', 'PS384', 'PS512' ], 'scopes_supported': [ 'system/*.rs', 'system/Patient.rs', 'system/Encounter.rs', 'system/Condition.rs', 'system/Claim.rs', 'system/ExplanationOfBenefit.rs', 'system/Observation.rs', 'system/Immunization.rs', 'system/DiagnosticReport.rs', 'system/Procedure.rs', 'system/CareTeam.rs', 'system/CarePlan.rs', 'system/MedicationRequest.rs', 'system/AllergyIntolerance.rs', 'system/Device.rs', 'system/ImagingStudy.rs', 'system/Organization.rs', 'system/Practitioner.rs', 'system/DocumentReference.rs', 'system/Group.rs', 'system/*.read', 'system/Patient.read', 'system/Encounter.read', 'system/Condition.read', 'system/Claim.read', 'system/ExplanationOfBenefit.read', 'system/Observation.read', 'system/Immunization.read', 'system/DiagnosticReport.read', 'system/Procedure.read', 'system/CareTeam.read', 'system/CarePlan.read', 'system/MedicationRequest.read', 'system/AllergyIntolerance.read', 'system/Device.read', 'system/ImagingStudy.read', 'system/Organization.read', 'system/Practitioner.read', 'system/DocumentReference.read', 'system/Group.read' ], 'capabilities': ['permission-v2', 'permission-v1', 'client-confidential-asymmetric'] }
We care most about the token_endpoint
field, which we need to request our JWT. For more information about the other fields, see here.
= smart_config['token_endpoint'] token_endpoint
Now we have our token endpoint, so we can make a request to it to get a token. The request follows the OAuth 2.0 “Client Credentials” flow, using a JSON Web Token (JWT) assertion containing our client ID and signed with our private key.
📘 Read more about the access token request specification
# Create a JWT client assertion as follows:
import jwt
import datetime
= jwt.encode({
assertion 'iss': client_id, # "iss" == "issuer", the client that created this JWT
'sub': client_id, # "sub" == "subject", the client that will use the access token
'aud': token_endpoint, # "aud" == "audience", the receiver of this request
'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
},# signed with the private key
private_key, ='RS384', # algorithm for the key
algorithm={"kid": key_id}) # kid is required for smart bulk data server
headers
# And then POST it to the token endpont
= session.post(token_endpoint, data={
r 'scope': 'system/*.read',
'grant_type': 'client_credentials',
'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
'client_assertion': assertion
})
= r.json()
token_response
# And inspect the response:
token_response
{'token_type': 'bearer',
'scope': 'system/*.read',
'expires_in': 300,
'access_token': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbl90eXBlIjoiYmVhcmVyIiwic2NvcGUiOiJzeXN0ZW0vKi5yZWFkIiwiZXhwaXJlc19pbiI6MzAwLCJpYXQiOjE2ODY2NTQzNTIsImV4cCI6MTY4NjY1NDY1Mn0.6VcZfI7YBkrGV7IoIBKmQo2usjrpCkIgmJHx8jFir3g'}
Two important fields we need to keep track of are the token itself, and the expire time. Tokens are only valid for a certain amount of time, and once they expire we will need to fetch a new one via the same process as above. expires_in
is in seconds from the current time, so we’ll add that to the current time to get a timestamp we can compare against.
Note that for this example we requested and received 'scope': 'system/*.read'
which allows access to all resource types. In practice, requesting access to all resource types is generally not recommended, and servers do not always support asking for *
scopes. Generally it is recommended to request only the minimal level of access necessary.
= token_response['access_token']
token = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in']) expire_time
To make this easier for ourselves, let’s package this up into a get_token()
function that we can call anytime we need to use a token. If the current token is still valid, use that, or if it has expired, fetch a new one. The logic is exactly the same as the previous steps we just ran:
def get_token():
global token, expire_time
if datetime.datetime.now() < expire_time:
# the existing token is still valid so return it
return token
= jwt.encode({
assertion 'iss': client_id,
'sub': client_id,
'aud': token_endpoint,
'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
='RS384',
}, private_key, algorithm={"kid": key_id})
headers
= session.post(token_endpoint, data={
r 'scope': 'system/*.read',
'grant_type': 'client_credentials',
'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
'client_assertion': assertion
})
= r.json()
token_response = token_response['access_token']
token = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in'])
expire_time
return token
Starting, Checking, and Downloading the Export
Now that we have an access token, the next step in using Bulk Data is to request the export of data, via a “kick-off request”. This is an asynchronous request – once the request is accepted, instead of returning the results directly, the server response will point to a URL where the client can check the status.
There are three levels of export: - Patient, to obtain resources related to all Patients - Group, to obtain resources associated with a particular Group - System, to obtain all resources, whether or not they are associated with a patient
For this exercise we will initially only request Patient-level data, but the general process for Groups and System-level data is exactly the same - there is just a different endpoint to hit, and a different set of data will be returned.
There are also a number of parameters that may be set, but to keep things simple we will only use the _type
parameter, to request only Patient
and Condition
resource types.
📘 Read more about the Bulk Data Kick-off Request
Let’s make the export request and inspect the response headers. For “Patient” level data, the URL we want to hit is {server}/Patient/$export
. Our token is used in the “Authorization” header in the format "Bearer {token}"
.
= session.get(f'{server_url}/Patient/$export?_type=Patient,Condition',
r ={'Authorization': f'Bearer {get_token()}',
headers'Accept': 'application/fhir+json',
'Prefer': 'respond-async'})
print(r.headers)
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'Content-Location': 'https://bulk-data.smarthealthit.org/fhir/bulkstatus/55e2770a4b9c3f861f002634bd44ac62', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '644', 'Etag': 'W/"284-G8JHR+JPFTg+y5JhfrRbOzc4ZMI"', 'Date': 'Tue, 13 Jun 2023 11:05:52 GMT', 'Via': '1.1 vegur'}
We see the status URL in the Content-Location
header, so let’s save that into a variable.
= r.headers['Content-Location'] check_url
We can now check the status by getting that URL, and the HTTP status code of the response will indicate the exort status. - Code 200 means the export is complete, and the response body will indicate the location - Code 202 means the export is still in progress - Codes in the range 4xx-5xx indicate an error has occurred. 4xx codes generally indicate an error in the request, and 5xx codes generally indicate a server error.
Note that in production environments it is recommended to check the status as infrequently as possible, to minimize the load on the server. In this case we expect the export to complete in just a few seconds so the impact of checking every two seconds is minimal. The server will also include a “Retry-After” header which will give us a hint on how long to wait before trying again. We’ll check that status in a loop, and break out of the loop when we get a complete or error response. We’ll print status each time through the loop, and the response body when the export is complete.
# Now we check the status in a loop
from time import sleep
while True:
= session.get(check_url, headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})
r
if r.status_code == 200:
# complete
= r.json()
response print(response)
break
elif r.status_code == 202:
# in progress
print(r.headers)
= r.headers['Retry-After']
delay
print(f"Sleeping {delay} seconds before retrying")
int(delay))
sleep(
else:
# error
print(r.text)
break
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '1% complete, currenly processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:52 GMT', 'Content-Length': '0', 'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '21% complete, currenly processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:54 GMT', 'Content-Length': '0', 'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '42% complete, currenly processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:56 GMT', 'Content-Length': '0', 'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '63% complete, currenly processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:05:58 GMT', 'Content-Length': '0', 'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{'Server': 'Cowboy', 'Connection': 'keep-alive', 'X-Powered-By': 'Express', 'X-Progress': '83% complete, currenly processing Patient resources', 'Retry-After': '2', 'Date': 'Tue, 13 Jun 2023 11:06:00 GMT', 'Content-Length': '0', 'Via': '1.1 vegur'}
Sleeping 2 seconds before retrying
{ 'transactionTime': '1686654352606', 'request': 'https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZW wiOjB9/fhir/Patient/$export?_type=Patient,Condition', 'requiresAccessToken': True, 'output': [ { 'type': 'Condition', 'count': 639, 'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdC I6NjM5LCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Condition.ndjson' }, { 'type': 'Patient', 'count': 100, 'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdC I6MTAwLCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Patient.ndjson' } ], 'deleted': [], 'error': [] }
We can see that the response points us to one or more NDJSON (Newline Delimited JSON) files per resource type, in the output
field of the response.
Note that in this case the volume of data is relatively small, and there is only one entry in the list per resource type, but for large datasets it is possible that there could be multiple files (and therefore multiple entries in this list) per resource type.
Let’s save that list to a variable.
= response['output']
output_files output_files
[{'type': 'Condition',
'count': 639,
'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdCI6NjM5LCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Condition.ndjson'},
{'type': 'Patient',
'count': 100,
'url': 'https://bulk-data.smarthealthit.org/eyJpZCI6IjU1ZTI3NzBhNGI5YzNmODYxZjAwMjYzNGJkNDRhYzYyIiwib2Zmc2V0IjowLCJsaW1pdCI6MTAwLCJzZWN1cmUiOnRydWV9/fhir/bulkfiles/1.Patient.ndjson'}]
Now we can loop through the list and download each one. Each file is an NDJSON, so that means we’ll see one resource per line.
To make each step clear and distinct, we’ll keep a dict of { resourceType: [resources,...]}
which we can process later.
Note: for this exercise we are only reading the NDJSON files into a dict in memory, but in practice you may want to save the file locally first in case there are errors in processing, especially if the files are large.
import json
= {}
resources_by_type
for output_file in tqdm(output_files):
= output_file['url']
download_url = output_file['type']
resource_type
= session.get(download_url, headers={'Authorization': f'Bearer {get_token()}',
r 'Accept': 'application/fhir+json'})
= r.text.strip() # remove any whitespace, in particular trailing newlines
ndjson
if resource_type not in resources_by_type:
= []
resources_by_type[resource_type]
# NDJSON can't be parsed as a whole, we have to process it line-by-line
for line in ndjson.split('\n'):
= json.loads(line)
resource
resources_by_type[resource_type].append(resource)
# This is a large amount of JSON data, only uncomment this line if you care to review
# print(resources_by_type)
Converting to DataFrames
Finally, let’s convert these into DataFrames.
The quick-and-dirty option is to use the Pandas json_normalize()
function to parse a list of dict
s into a DataFrame.
📘 Read more about pandas.json_normalize
import pandas as pd
= {}
resource_dfs
for resource_type, resources in resources_by_type.items():
= pd.json_normalize(resources)
resource_dfs[resource_type]
# Now we can work with them by type:
'Patient'] resource_dfs[
resourceType | id | extension | identifier | name | telecom | gender | birthDate | address | multipleBirthBoolean | communication | text.status | text.div | maritalStatus.coding | maritalStatus.text | multipleBirthInteger | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Patient | 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Lemke', 'given... | [{'system': 'phone', 'value': '555-532-1156', ... | male | 1965-01-13 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
1 | Patient | 58c297c4-d684-4677-8024-01131d93835e | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Wintheiser', '... | [{'system': 'phone', 'value': '555-712-4709', ... | female | 1971-04-05 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
2 | Patient | 538a9a4e-8437-47d3-8c01-1a17dca8f0be | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Alaniz', 'give... | [{'system': 'phone', 'value': '555-446-6900', ... | male | 1923-03-24 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
3 | Patient | c6c60742-8694-46e4-bb42-b00bf6d8b536 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Walsh', 'given... | [{'system': 'phone', 'value': '555-436-4287', ... | female | 1965-10-27 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
4 | Patient | fbfec681-d357-4b28-b1d2-5db6434c7846 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Bednar', 'give... | [{'system': 'phone', 'value': '555-405-4909', ... | female | 1942-07-04 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | Patient | 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Schmeler', 'gi... | [{'system': 'phone', 'value': '555-971-6300', ... | male | 1995-10-19 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | Never Married | NaN |
96 | Patient | c1981741-f90e-4077-9156-429a3c4c5ded | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Lubowitz', 'gi... | [{'system': 'phone', 'value': '555-328-5229', ... | male | 1956-05-06 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
97 | Patient | f98b23bf-4443-46d0-9eaf-563e767cf948 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Funk', 'given'... | [{'system': 'phone', 'value': '555-497-7639', ... | male | 1966-02-07 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | M | NaN |
98 | Patient | c536dee9-9ef6-4807-ae20-9f1045c9c7d6 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Bergstrom', 'g... | [{'system': 'phone', 'value': '555-845-1730', ... | male | 1990-11-18 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | S | NaN |
99 | Patient | a845ead4-d9de-42eb-b4b5-eb21a8963578 | [{'url': 'http://hl7.org/fhir/StructureDefinit... | [{'system': 'https://github.com/synthetichealt... | [{'use': 'official', 'family': 'Pagac', 'given... | [{'system': 'phone', 'value': '555-504-1379', ... | female | 1968-04-20 | [{'extension': [{'url': 'http://hl7.org/fhir/S... | False | [{'language': {'coding': [{'system': 'urn:ietf... | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | [{'system': 'http://terminology.hl7.org/CodeSy... | S | NaN |
100 rows × 16 columns
This works, but it’s clearly not ideal in how it handles nested fields, such as the nested lists of the name
field. One way we can do a little better is with the flatten_json library: https://github.com/amirziai/flatten
from flatten_json import flatten
for resource_type, resources in resources_by_type.items():
= pd.json_normalize(list(map(lambda r: flatten(r), resources)))
resource_dfs[resource_type]
# Now let's take another look
'Patient'] resource_dfs[
resourceType | id | text_status | text_div | extension_0_url | extension_0_valueString | extension_1_url | extension_1_valueAddress_city | extension_1_valueAddress_state | extension_1_valueAddress_country | ... | multipleBirthBoolean | communication_0_language_coding_0_system | communication_0_language_coding_0_code | communication_0_language_coding_0_display | communication_0_language_text | name_1_use | name_1_family | name_1_given_0 | name_1_prefix_0 | multipleBirthInteger | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Patient | 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Lettie Boyle | http://hl7.org/fhir/StructureDefinition/patien... | Boston | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
1 | Patient | 58c297c4-d684-4677-8024-01131d93835e | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Marquetta Schamberger | http://hl7.org/fhir/StructureDefinition/patien... | Macau | Macao Special Administrative Region of the Peo... | CN | ... | False | urn:ietf:bcp:47 | zh | Chinese | Chinese | maiden | Heathcote | Aleta | Mrs. | NaN |
2 | Patient | 538a9a4e-8437-47d3-8c01-1a17dca8f0be | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Pilar Orta | http://hl7.org/fhir/StructureDefinition/patien... | San Jose | San Jose | CR | ... | False | urn:ietf:bcp:47 | es | Spanish | Spanish | NaN | NaN | NaN | NaN | NaN |
3 | Patient | c6c60742-8694-46e4-bb42-b00bf6d8b536 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Arvilla Haag | http://hl7.org/fhir/StructureDefinition/patien... | Norton | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | maiden | Kuphal | Alyce | Mrs. | NaN |
4 | Patient | fbfec681-d357-4b28-b1d2-5db6434c7846 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Marcelina Harber | http://hl7.org/fhir/StructureDefinition/patien... | Brockton | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | maiden | Runolfsson | Arnette | Mrs. | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | Patient | 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Allison Daugherty | http://hl7.org/fhir/StructureDefinition/patien... | Boston | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
96 | Patient | c1981741-f90e-4077-9156-429a3c4c5ded | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Antoinette Parker | http://hl7.org/fhir/StructureDefinition/patien... | Mansfield | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
97 | Patient | f98b23bf-4443-46d0-9eaf-563e767cf948 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Barbar Windler | http://hl7.org/fhir/StructureDefinition/patien... | Randolph | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
98 | Patient | c536dee9-9ef6-4807-ae20-9f1045c9c7d6 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Juli Johns | http://hl7.org/fhir/StructureDefinition/patien... | Holyoke | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
99 | Patient | a845ead4-d9de-42eb-b4b5-eb21a8963578 | generated | <div xmlns="http://www.w3.org/1999/xhtml">Gene... | http://hl7.org/fhir/StructureDefinition/patien... | Lanie Hyatt | http://hl7.org/fhir/StructureDefinition/patien... | Millis | Massachusetts | US | ... | False | urn:ietf:bcp:47 | en-US | English | English | NaN | NaN | NaN | NaN | NaN |
100 rows × 73 columns
Let’s look at just one row so it’s easier to see all the columns and an example value:
with pd.option_context('display.max_rows', 1000, 'display.max_columns', 10):
print(resource_dfs['Patient'].loc[0].T)
resourceType Patient id 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 text_status generated text_div <div xmlns="http://www.w3.org/1999/xhtml">Gene... extension_0_url http://hl7.org/fhir/StructureDefinition/patien... extension_0_valueString Lettie Boyle extension_1_url http://hl7.org/fhir/StructureDefinition/patien... extension_1_valueAddress_city Boston extension_1_valueAddress_state Massachusetts extension_1_valueAddress_country US extension_2_url http://synthetichealth.github.io/synthea/disab... extension_2_valueDecimal 0.305628 extension_3_url http://synthetichealth.github.io/synthea/quali... extension_3_valueDecimal 53.694372 identifier_0_system https://github.com/synthetichealth/synthea identifier_0_value 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 identifier_1_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203 identifier_1_type_coding_0_code MR identifier_1_type_coding_0_display Medical Record Number identifier_1_type_text Medical Record Number identifier_1_system http://hospital.smarthealthit.org identifier_1_value 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 identifier_2_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203 identifier_2_type_coding_0_code SS identifier_2_type_coding_0_display Social Security Number identifier_2_type_text Social Security Number identifier_2_system http://hl7.org/fhir/sid/us-ssn identifier_2_value 999-18-8203 identifier_3_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203 identifier_3_type_coding_0_code DL identifier_3_type_coding_0_display Driver's License identifier_3_type_text Driver's License identifier_3_system urn:oid:2.16.840.1.113883.4.3.25 identifier_3_value S99914534 identifier_4_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203 identifier_4_type_coding_0_code PPN identifier_4_type_coding_0_display Passport Number identifier_4_type_text Passport Number identifier_4_system http://standardhealthrecord.org/fhir/Structure... identifier_4_value X41457228X name_0_use official name_0_family Lemke name_0_given_0 Abram name_0_prefix_0 Mr. telecom_0_system phone telecom_0_value 555-532-1156 telecom_0_use home gender male birthDate 1965-01-13 address_0_extension_0_url http://hl7.org/fhir/StructureDefinition/geoloc... address_0_extension_0_extension_0_url latitude address_0_extension_0_extension_0_valueDecimal 42.264144 address_0_extension_0_extension_1_url longitude address_0_extension_0_extension_1_valueDecimal -72.642902 address_0_line_0 167 Nikolaus Gate address_0_city Easthampton address_0_state Massachusetts address_0_postalCode 01027 address_0_country US maritalStatus_coding_0_system http://terminology.hl7.org/CodeSystem/v3-Marit... maritalStatus_coding_0_code M maritalStatus_coding_0_display M maritalStatus_text M multipleBirthBoolean False communication_0_language_coding_0_system urn:ietf:bcp:47 communication_0_language_coding_0_code en-US communication_0_language_coding_0_display English communication_0_language_text English name_1_use NaN name_1_family NaN name_1_given_0 NaN name_1_prefix_0 NaN multipleBirthInteger NaN Name: 0, dtype: object
Next, what if we know in advance we will only want certain fields?
Let’s follow the same pattern the FHIR-PYrate library uses, and use FHIRPath to define the fields we want to extract, along with a friendly name. For this we’ll use the fhirpathpy library.
a path based navigation and extraction language, somewhat like XPath. Operations are expressed in terms of the logical content of hierarchical data models, and support traversal, selection and filtering of data.
If you are not familiar with FHIRPath, Section 3 of the FHIRPath spec describes some of the basics.
import fhirpathpy
= [
fhir_paths "id", "identifier[0].value"],
["gender", "gender"],
["date_of_birth", "birthDate"],
["marital_status", "maritalStatus.coding.first().code"]
[
]
# compile the fhirpath so they can be reused. this will result in better performance on large datasets
for f in fhir_paths:
1] = fhirpathpy.compile(f[1])
f[
for resource_type, resources in resources_by_type.items():
= []
filtered_resources
for resource in resources:
= {}
filtered_resource for f in fhir_paths:
= f[0]
fieldname = f[1]
func = func(resource)
filtered_resource[fieldname]
# fhirpathpy always returns a list, which can make the DataFrame messy
# if it's a list with only one item, extract the item from the list
if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
= filtered_resource[fieldname][0]
filtered_resource[fieldname]
filtered_resources.append(filtered_resource)
= pd.json_normalize(list(map(lambda r: flatten(r), filtered_resources)))
resource_dfs[resource_type]
'Patient'] resource_dfs[
id | gender | date_of_birth | marital_status | |
---|---|---|---|---|
0 | 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | male | 1965-01-13 | M |
1 | 58c297c4-d684-4677-8024-01131d93835e | female | 1971-04-05 | M |
2 | 538a9a4e-8437-47d3-8c01-1a17dca8f0be | male | 1923-03-24 | M |
3 | c6c60742-8694-46e4-bb42-b00bf6d8b536 | female | 1965-10-27 | M |
4 | fbfec681-d357-4b28-b1d2-5db6434c7846 | female | 1942-07-04 | M |
... | ... | ... | ... | ... |
95 | 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 | male | 1995-10-19 | S |
96 | c1981741-f90e-4077-9156-429a3c4c5ded | male | 1956-05-06 | M |
97 | f98b23bf-4443-46d0-9eaf-563e767cf948 | male | 1966-02-07 | M |
98 | c536dee9-9ef6-4807-ae20-9f1045c9c7d6 | male | 1990-11-18 | S |
99 | a845ead4-d9de-42eb-b4b5-eb21a8963578 | female | 1968-04-20 | S |
100 rows × 4 columns
Bringing it all together
Now we have everything we need to connect to a FHIR server that supports Bulk Data, request and download exported data, and convert it into a DataFrame. Let’s bring everything together from the previous steps into one class with a clear entrypoint.
import requests
import jwt
import datetime
import json
import fhirpathpy
from flatten_json import flatten
from typing import Optional
from collections import defaultdict
class BulkDataFetcher:
def __init__(
self,
str,
base_url: str,
client_id: str,
private_key: str,
key_id: str] = None,
endpoint: Optional[str] = None
session: Optional[
):self.base_url = base_url
self.client_id = client_id
self.private_key = private_key
self.key_id = key_id
self.token = None
self.token_expire_time = None
if endpoint is None:
self.endpoint = "Patient"
else:
self.endpoint = endpoint
if session is None:
self.session = requests.Session()
else:
self.session = session
= self.session.get(f'{base_url}/.well-known/smart-configuration')
r = r.json()
smart_config self.token_endpoint = smart_config['token_endpoint']
self.resource_types = []
self.fhir_paths = {}
# Store raw FHIR resource instances; populated as part of get_dataframes()
self.resources_by_type = {}
def get_token(self):
if self.token and datetime.datetime.now() < self.expire_time:
# the existing token is still valid so use it
return self.token
= jwt.encode({
assertion 'iss': self.client_id,
'sub': self.client_id,
'aud': self.token_endpoint,
'exp': int((datetime.datetime.now() + datetime.timedelta(minutes=5)).timestamp())
self.private_key, algorithm='RS384',
}, ={"kid": key_id})
headers
= self.session.post(self.token_endpoint, data={
r 'scope': 'system/*.read',
'grant_type': 'client_credentials',
'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
'client_assertion': assertion
})
= r.json()
token_response self.token = token_response['access_token']
self.expire_time = datetime.datetime.now() + datetime.timedelta(seconds=token_response['expires_in'])
return self.token
def add_resource_type(self, resource_type: str, fhir_paths = None):
self.resource_types.append(resource_type)
if fhir_paths:
# fhir_paths=[
# ("id", "identifier[0].value"),
# ("marital_status", "maritalStatus.coding[0].code")
# ]
= [(f[0], fhirpathpy.compile(f[1])) for f in fhir_paths]
compiled_fhir_paths self.fhir_paths[resource_type] = compiled_fhir_paths
def _invoke_request(self):
= ','.join(self.resource_types)
types = f'{self.base_url}/{self.endpoint}/$export?_type={types}'
url print(f'Fetching from {url}')
= self.session.get(url, headers={'Authorization': f'Bearer {self.get_token()}', 'Accept': 'application/fhir+json', 'Prefer': 'respond-async'})
r
self.check_url = r.headers['Content-Location']
return self.check_url
def _wait_until_ready(self):
while True:
= self.session.get(self.check_url, headers={'Authorization': f'Bearer {self.get_token()}', 'Accept': 'application/fhir+json'})
r
# There are three possible options here: http://hl7.org/fhir/uv/bulkdata/export.html#bulk-data-status-request
# Error = 4xx or 5xx status code
# In-Progress = 202
# Complete = 200
if r.status_code == 200:
# complete
= r.json()
response self.output_files = response['output']
return self.output_files
elif r.status_code == 202:
# in progress
= r.headers['Retry-After']
delay
int(delay))
sleep(
else:
raise RuntimeError(r.text)
def get_dataframes(self):
self._invoke_request()
self._wait_until_ready()
= {}
resources_by_type self.resources_by_type = {} # Reset store of raw FHIR resources each time this is run
for output_file in self.output_files:
= output_file['url']
download_url = output_file['type']
resource_type
= self.session.get(download_url, headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})
r
= r.text.strip()
ndjson
if resource_type not in resources_by_type:
= []
resources_by_type[resource_type] self.resources_by_type[resource_type] = []
for line in ndjson.split('\n'):
= json.loads(line)
resource
# Make raw resource instances available for future use
self.resources_by_type[resource_type].append(resource)
if resource_type in self.fhir_paths:
= self.fhir_paths[resource_type]
fhir_paths = {}
filtered_resource for f in fhir_paths:
= f[0]
fieldname = f[1]
func = func(resource)
filtered_resource[fieldname]
if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
= filtered_resource[fieldname][0]
filtered_resource[fieldname] = filtered_resource
resource
resources_by_type[resource_type].append(resource)
= {}
dfs
for resource_type, resources in resources_by_type.items():
= pd.json_normalize(list(map(lambda r: flatten(r), resources)))
dfs[resource_type]
return dfs
def get_example_resource(self, resource_type: str, resource_id: Optional[str] = None):
if self.resources_by_type is None:
print("You need to run get_dataframes() first")
return None
if resource_type not in self.resources_by_type:
print(f"{resource_type} not available. Try one of these: {', '.join(self.resources_by_type.keys())}")
return None
if resource_id is None:
return self.resources_by_type[resource_type][0]
= [r for r in self.resources_by_type[resource_type] if r['id'] == resource_id]
resource
if len(resource) > 0:
return resource[0]
print(f"No {resource_type} with id={resource_id} was found.")
return None
def reprocess_dataframes(self, fhir_paths):
return BulkDataFetcher._reprocess_dataframes(self.resources_by_type, fhir_paths)
@classmethod
def _reprocess_dataframes(cls, obj_resources_by_type, user_fhir_paths):
= defaultdict(list)
parsed_resources_by_type
for this_resource_type in obj_resources_by_type.keys():
if this_resource_type in user_fhir_paths:
= [(f[0], fhirpathpy.compile(f[1])) for f in user_fhir_paths[this_resource_type]]
user_fhir_paths[this_resource_type] for resource in obj_resources_by_type[this_resource_type]:
if this_resource_type in user_fhir_paths:
= {}
filtered_resource for f in user_fhir_paths[this_resource_type]:
= f[0]
fieldname = f[1]
func = func(resource)
filtered_resource[fieldname]
if isinstance(filtered_resource[fieldname], list) and len(filtered_resource[fieldname]) == 1:
= filtered_resource[fieldname][0]
filtered_resource[fieldname]
parsed_resources_by_type[this_resource_type].append(filtered_resource)else:
parsed_resources_by_type[this_resource_type].append(resource)
= {}
dfs
for t, res in parsed_resources_by_type.items():
= pd.json_normalize(list(map(lambda r: flatten(r), res)))
dfs[t]
return dfs
# And then to invoke it:
# create a BulkDataFetcher with our credentials
= BulkDataFetcher(
fetcher =server_url, client_id=client_id, private_key=private_key, key_id=key_id, session=session
base_url
)
# add a resource type of interest, with some FHIRPath field mappings
'Patient', [
fetcher.add_resource_type("id", "identifier[0].value"),
("gender", "gender"),
("date_of_birth", "birthDate"),
("marital_status", "maritalStatus.coding.first().code")
(
])
# add another resource type, with no FHIRPath mappings (load the entire resource)
'Condition')
fetcher.add_resource_type(
= fetcher.get_dataframes()
dfs
'Patient'] dfs[
Fetching from https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw iOjB9/fhir/Patient/$export?_type=Patient,Condition
id | gender | date_of_birth | marital_status | |
---|---|---|---|---|
0 | 6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | male | 1965-01-13 | M |
1 | 58c297c4-d684-4677-8024-01131d93835e | female | 1971-04-05 | M |
2 | 538a9a4e-8437-47d3-8c01-1a17dca8f0be | male | 1923-03-24 | M |
3 | c6c60742-8694-46e4-bb42-b00bf6d8b536 | female | 1965-10-27 | M |
4 | fbfec681-d357-4b28-b1d2-5db6434c7846 | female | 1942-07-04 | M |
... | ... | ... | ... | ... |
95 | 5efb1ac1-d29b-40a5-a3d1-2d682f10bfa7 | male | 1995-10-19 | S |
96 | c1981741-f90e-4077-9156-429a3c4c5ded | male | 1956-05-06 | M |
97 | f98b23bf-4443-46d0-9eaf-563e767cf948 | male | 1966-02-07 | M |
98 | c536dee9-9ef6-4807-ae20-9f1045c9c7d6 | male | 1990-11-18 | S |
99 | a845ead4-d9de-42eb-b4b5-eb21a8963578 | female | 1968-04-20 | S |
100 rows × 4 columns
'Condition'] dfs[
resourceType | id | clinicalStatus_coding_0_system | clinicalStatus_coding_0_code | verificationStatus_coding_0_system | verificationStatus_coding_0_code | code_coding_0_system | code_coding_0_code | code_coding_0_display | code_text | subject_reference | encounter_reference | onsetDateTime | recordedDate | abatementDateTime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Condition | a5a38601-b6fe-46b4-a67e-cde9d5957dde | http://terminology.hl7.org/CodeSystem/conditio... | active | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 40055000 | Chronic sinusitis (disorder) | Chronic sinusitis (disorder) | Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | Encounter/17b801ac-58e3-4f6b-8b48-8e33f3a36086 | 1985-06-18T17:30:49-04:00 | 1985-06-18T17:30:49-04:00 | NaN |
1 | Condition | 8f818ad4-c292-47e8-8d99-c4c54174b671 | http://terminology.hl7.org/CodeSystem/conditio... | active | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 162864005 | Body mass index 30+ - obesity (finding) | Body mass index 30+ - obesity (finding) | Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | Encounter/0953dd44-90bb-4805-badd-169a761a6ab3 | 2005-01-19T16:30:49-05:00 | 2005-01-19T16:30:49-05:00 | NaN |
2 | Condition | 65d9d5f2-a772-4586-932f-df1f2ce1a863 | http://terminology.hl7.org/CodeSystem/conditio... | active | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 15777000 | Prediabetes | Prediabetes | Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | Encounter/d4e1370a-a679-4570-a3dc-e4f7ac847512 | 2013-02-06T16:30:49-05:00 | 2013-02-06T16:30:49-05:00 | NaN |
3 | Condition | 77ac8342-6950-4302-a303-efba12e06785 | http://terminology.hl7.org/CodeSystem/conditio... | resolved | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 68496003 | Polyp of colon | Polyp of colon | Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | Encounter/58ad433b-3707-4d40-9b63-2a803b4913bd | 2015-01-14T16:30:49-05:00 | 2015-01-14T16:30:49-05:00 | 2017-05-03T17:30:49-04:00 |
4 | Condition | 6514ab0c-bc64-4e1b-aa61-b97d27d72bc7 | http://terminology.hl7.org/CodeSystem/conditio... | active | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 271737000 | Anemia (disorder) | Anemia (disorder) | Patient/6c5d9ca9-54d7-42f5-bfae-a7c19cd217f2 | Encounter/58ad433b-3707-4d40-9b63-2a803b4913bd | 2015-01-14T16:30:49-05:00 | 2015-01-14T16:30:49-05:00 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
634 | Condition | ab051f6c-4298-407b-9315-2322ce913539 | http://terminology.hl7.org/CodeSystem/conditio... | active | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 162864005 | Body mass index 30+ - obesity (finding) | Body mass index 30+ - obesity (finding) | Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 | Encounter/c5ed8aed-2b7e-4630-bd1d-ac5090967edc | 2014-11-22T15:43:42-05:00 | 2014-11-22T15:43:42-05:00 | NaN |
635 | Condition | 76c1f07a-f8f2-4705-aa80-5f7a25d7c651 | http://terminology.hl7.org/CodeSystem/conditio... | resolved | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 39848009 | Whiplash injury to neck | Whiplash injury to neck | Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 | Encounter/9c8b41dd-d6fd-4691-ae46-01b47992dd8d | 2015-07-13T16:43:42-04:00 | 2015-07-13T16:43:42-04:00 | 2015-08-10T16:43:42-04:00 |
636 | Condition | b9a078eb-bb83-49ed-b4ed-633d1445356d | http://terminology.hl7.org/CodeSystem/conditio... | resolved | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 70704007 | Sprain of wrist | Sprain of wrist | Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 | Encounter/f044f05a-8433-4952-926d-dd8e2b4ee44e | 2018-07-25T16:43:42-04:00 | 2018-07-25T16:43:42-04:00 | 2018-08-15T16:43:42-04:00 |
637 | Condition | 0fe427ce-7ea1-4409-8de1-3879f9dc56bb | http://terminology.hl7.org/CodeSystem/conditio... | resolved | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 444814009 | Viral sinusitis (disorder) | Viral sinusitis (disorder) | Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 | Encounter/9100e9aa-1206-403b-b2bf-b75ac23991bd | 2018-09-26T16:43:42-04:00 | 2018-09-26T16:43:42-04:00 | 2018-10-17T16:43:42-04:00 |
638 | Condition | 88f3f41a-68ec-46dd-8d44-9178d3872220 | http://terminology.hl7.org/CodeSystem/conditio... | resolved | http://terminology.hl7.org/CodeSystem/conditio... | confirmed | http://snomed.info/sct | 72892002 | Normal pregnancy | Normal pregnancy | Patient/a845ead4-d9de-42eb-b4b5-eb21a8963578 | Encounter/2d975caf-e6bf-43c2-8778-ea293df1f255 | 2018-12-22T15:43:42-05:00 | 2018-12-22T15:43:42-05:00 | 2019-07-27T16:43:42-04:00 |
639 rows × 15 columns
Group export
§170.315(g)(10) Standardized API for patient and population services requires group-export
as of December 2022.
This is therefore the FHIR Bulk Data endpoint you are likely to find in EHRs.
To use this endpoint, you will need the ID of the group of patients you want to export. In a production setting, this would typically be provided by the administrators of the EHR.
For the bulk-data.smarthealthit.org
testing server, we can ask it for a list of groups via the FHIR API:
= session.get(f'{server_url}/Group', headers={'Authorization': f'Bearer {get_token()}', 'Accept': 'application/fhir+json'})
r r.json()
{'resourceType': 'Bundle',
'id': 'e21c6a557591d81d35d3f3bae22e6b490c71ad36b1b75b163393ea102e47eae8',
'meta': {'lastUpdated': '2023-06-13 01:26:34'},
'type': 'searchset',
'total': 8,
'link': [{'relation': 'self',
'url': 'https://bulk-data.smarthealthit.org/fhir/Group'}],
'entry': [{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07',
'resource': {'resourceType': 'Group',
'id': '1f76e2b7-a222-4765-9097-a71b86e90d07',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': '1f76e2b7-a222-4765-9097-a71b86e90d07'}],
'quantity': 25,
'name': 'Health New England',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Health New England</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b',
'resource': {'resourceType': 'Group',
'id': '84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': '84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b'}],
'quantity': 3,
'name': 'Minuteman Health',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Minuteman Health</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/a1f090cb-ffd1-436d-a815-fb047d9a1903',
'resource': {'resourceType': 'Group',
'id': 'a1f090cb-ffd1-436d-a815-fb047d9a1903',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'a1f090cb-ffd1-436d-a815-fb047d9a1903'}],
'quantity': 10,
'name': 'BMC HealthNet',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">BMC HealthNet</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/a95907b4-0c41-462a-bfcf-cb822075eb39',
'resource': {'resourceType': 'Group',
'id': 'a95907b4-0c41-462a-bfcf-cb822075eb39',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'a95907b4-0c41-462a-bfcf-cb822075eb39'}],
'quantity': 3,
'name': 'Harvard Pilgrim Health Care',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Harvard Pilgrim Health Care</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7',
'resource': {'resourceType': 'Group',
'id': 'ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7'}],
'quantity': 22,
'name': 'Tufts Health Plan',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Tufts Health Plan</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/b058c5e7-209c-4162-9289-0ff703347c0f',
'resource': {'resourceType': 'Group',
'id': 'b058c5e7-209c-4162-9289-0ff703347c0f',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'b058c5e7-209c-4162-9289-0ff703347c0f'}],
'quantity': 3,
'name': 'Fallon Health',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Fallon Health</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/cf04e363-eef4-4653-9650-846bca43f357',
'resource': {'resourceType': 'Group',
'id': 'cf04e363-eef4-4653-9650-846bca43f357',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'cf04e363-eef4-4653-9650-846bca43f357'}],
'quantity': 7,
'name': 'Neighborhood Health Plan',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Neighborhood Health Plan</div>'},
'type': 'person',
'actual': True}},
{'fullUrl': 'https://bulk-data.smarthealthit.org/fhir/Group/ff7dc35f-79e9-47a0-af22-475cf301a085',
'resource': {'resourceType': 'Group',
'id': 'ff7dc35f-79e9-47a0-af22-475cf301a085',
'identifier': [{'system': 'https://bulk-data/db-id',
'value': 'ff7dc35f-79e9-47a0-af22-475cf301a085'}],
'quantity': 27,
'name': 'Blue Cross Blue Shield',
'text': {'status': 'generated',
'div': '<div xmlns="http://www.w3.org/1999/xhtml">Blue Cross Blue Shield</div>'},
'type': 'person',
'actual': True}}]}
Let’s quickly pull this into a Pandas DataFrame to make it easier to read:
= pd.json_normalize(r.json()['entry'])[['resource.id', 'resource.name', 'resource.quantity']]
groups groups
resource.id | resource.name | resource.quantity | |
---|---|---|---|
0 | 1f76e2b7-a222-4765-9097-a71b86e90d07 | Health New England | 25 |
1 | 84e2fc85-2b9b-4680-b7df-cfbc2ea7a12b | Minuteman Health | 3 |
2 | a1f090cb-ffd1-436d-a815-fb047d9a1903 | BMC HealthNet | 10 |
3 | a95907b4-0c41-462a-bfcf-cb822075eb39 | Harvard Pilgrim Health Care | 3 |
4 | ae6ad3d7-f19d-44d7-9e70-fd0b7cf915e7 | Tufts Health Plan | 22 |
5 | b058c5e7-209c-4162-9289-0ff703347c0f | Fallon Health | 3 |
6 | cf04e363-eef4-4653-9650-846bca43f357 | Neighborhood Health Plan | 7 |
7 | ff7dc35f-79e9-47a0-af22-475cf301a085 | Blue Cross Blue Shield | 27 |
Now we can request the patients and associated data for a specific group:
= groups.loc[0, 'resource.id']
group_id
= BulkDataFetcher(
fetcher =server_url, client_id=client_id, private_key=private_key, key_id=key_id, session=session,
base_url
# Tell the BulkDataFetcher to request data from the specified group rather than all patients
=f'Group/{group_id}'
endpoint
)
# add a resource type of interest, with some FHIRPath field mappings
'Patient', [
fetcher.add_resource_type("id", "identifier[0].value"),
("gender", "gender"),
("date_of_birth", "birthDate"),
("marital_status", "maritalStatus.coding.first().code")
(
])
# add another resource type, with no FHIRPath mappings (load the entire resource)
'Condition')
fetcher.add_resource_type(
= fetcher.get_dataframes()
dfs
'Patient'] dfs[
Fetching from https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw iOjB9/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07/$export?_type=Patient,Condition
id | gender | date_of_birth | marital_status | |
---|---|---|---|---|
0 | fbfec681-d357-4b28-b1d2-5db6434c7846 | female | 1942-07-04 | M |
1 | 0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | female | 1967-10-24 | S |
2 | 62e03ae7-079c-4eda-9b5a-29440d3a015a | male | 2013-05-25 | S |
3 | 7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | female | 1959-07-23 | M |
4 | 84d4bafe-1891-4e6c-b8aa-38d2eafd8193 | female | 2000-09-12 | S |
5 | 4bc3ef6a-65c5-470d-8911-f26194b2a0e3 | male | 2004-08-20 | S |
6 | 3ca9f003-e6dd-4110-b4c2-12c056b880f4 | female | 1991-12-27 | M |
7 | daf4e787-0ea5-45ff-a9a1-c68308e9f6a3 | male | 1995-06-21 | S |
8 | 1ad52ff0-428a-4048-aff8-f7196a2da649 | female | 1996-09-11 | S |
9 | 644d85af-aaf9-4068-ad23-1e55aedd5205 | male | 2003-09-12 | S |
10 | 687eb477-32ae-44ac-a0ef-2912623a14ff | female | 1960-10-18 | M |
11 | 70ac5078-22ef-471d-bed7-cb694775b4ba | female | 2001-08-25 | S |
12 | 7646ecba-4812-452e-88e7-6235f77dabb2 | female | 1997-11-20 | S |
13 | 69071541-e760-4d0c-bf8a-961a061cb0d5 | male | 1952-02-23 | M |
14 | 221fe1ec-a258-4fc4-8cc8-b7c960a8a0a9 | female | 2008-01-04 | S |
15 | 32e46528-35d1-4ed7-9aaa-09ae00f9681c | male | 2007-02-12 | S |
16 | b20c7c80-49ac-4926-8b03-e9c69b40e1f5 | male | 1951-04-26 | S |
17 | 733abdda-2bfa-485f-9c83-ed9b206889b2 | male | 1972-02-27 | M |
18 | 55940999-fd98-4922-b9bc-a6bf0c1855ed | male | 1931-04-18 | M |
19 | 7ba8d35f-3f70-48b9-b711-104374136ac7 | male | 2002-05-10 | S |
20 | ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 | female | 1993-02-09 | S |
21 | 8d3e1155-278a-4824-a7e0-fddb24c7c179 | male | 1991-10-27 | S |
22 | 8c9fea57-6ded-47b0-88c9-75518430b572 | female | 1962-08-01 | M |
23 | d2524ab6-4db9-440d-b588-6dcfcab89270 | male | 1979-11-11 | S |
24 | f98b23bf-4443-46d0-9eaf-563e767cf948 | male | 1966-02-07 | M |
A number of different FHIR resources are available from the test server:
- AllergyIntolerance
- CarePlan
- CareTeam
- Claim
- Condition
- Device
- DiagnosticReport
- DocumentReference
- Encounter
- ExplanationOfBenefit
- ImagingStudy
- Immunization
- MedicationRequest
- Observation
- Patient
- Procedure
Try modifying the request above to pull in resource types other than Patient
and Condition
. The links above go to the FHIR documentation for each resource type, which can help with constructing FHIRPaths.
# Try adding an additional resources
'Observation')
fetcher.add_resource_type(
= fetcher.get_dataframes()
dfs
'Observation'] dfs[
Fetching from https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MSwic3R1Ijo0LCJkZWw iOjB9/fhir/Group/1f76e2b7-a222-4765-9097-a71b86e90d07/$export?_type=Patient,Condition,Observation
resourceType | id | status | category_0_coding_0_system | category_0_coding_0_code | category_0_coding_0_display | code_coding_0_system | code_coding_0_code | code_coding_0_display | code_text | ... | component_1_valueQuantity_system | component_1_valueQuantity_code | valueCodeableConcept_coding_0_system | valueCodeableConcept_coding_0_code | valueCodeableConcept_coding_0_display | valueCodeableConcept_text | code_coding_1_system | code_coding_1_code | code_coding_1_display | valueString | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Observation | 7a10d1ef-97f2-468b-b3ab-78dc99c65cf6 | final | http://terminology.hl7.org/CodeSystem/observat... | vital-signs | vital-signs | http://loinc.org | 8302-2 | Body Height | Body Height | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Observation | 168c33d9-b4c8-4783-b97a-82fee9625f35 | final | http://terminology.hl7.org/CodeSystem/observat... | vital-signs | vital-signs | http://loinc.org | 72514-3 | Pain severity - 0-10 verbal numeric rating [Sc... | Pain severity - 0-10 verbal numeric rating [Sc... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Observation | 7a6ee07e-72d0-4a74-896f-13c1b2e9b254 | final | http://terminology.hl7.org/CodeSystem/observat... | vital-signs | vital-signs | http://loinc.org | 29463-7 | Body Weight | Body Weight | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Observation | fe4863c9-ac4e-439d-9d8f-834f1aebd3d8 | final | http://terminology.hl7.org/CodeSystem/observat... | vital-signs | vital-signs | http://loinc.org | 39156-5 | Body Mass Index | Body Mass Index | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | Observation | a7e89580-6be8-4b0a-a528-dfb91f2508bf | final | http://terminology.hl7.org/CodeSystem/observat... | vital-signs | vital-signs | http://loinc.org | 85354-9 | Blood Pressure | Blood Pressure | ... | http://unitsofmeasure.org | mm[Hg] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4024 | Observation | 6f4249dd-abeb-43dd-b388-407fff8ea4f8 | final | http://terminology.hl7.org/CodeSystem/observat... | laboratory | laboratory | http://loinc.org | 1920-8 | Aspartate aminotransferase [Enzymatic activity... | Aspartate aminotransferase [Enzymatic activity... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4025 | Observation | a04bc225-ad1c-4235-b792-1abde0cb5055 | final | http://terminology.hl7.org/CodeSystem/observat... | laboratory | laboratory | http://loinc.org | 2093-3 | Total Cholesterol | Total Cholesterol | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4026 | Observation | 1b26f91e-68d2-4630-a1e0-d2aba21ed4f6 | final | http://terminology.hl7.org/CodeSystem/observat... | laboratory | laboratory | http://loinc.org | 2571-8 | Triglycerides | Triglycerides | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4027 | Observation | e43ed8c1-0535-407a-abd0-41dd09be6d4a | final | http://terminology.hl7.org/CodeSystem/observat... | laboratory | laboratory | http://loinc.org | 18262-6 | Low Density Lipoprotein Cholesterol | Low Density Lipoprotein Cholesterol | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4028 | Observation | b8c6b918-95b6-43f9-9524-78371a931384 | final | http://terminology.hl7.org/CodeSystem/observat... | laboratory | laboratory | http://loinc.org | 2085-9 | High Density Lipoprotein Cholesterol | High Density Lipoprotein Cholesterol | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4029 rows × 42 columns
# Try filtering to just Observations of smoking status
# `fetcher.reprocess_dataframes()` does the same thing as `get_dataframes()`,
# but with FHIRPaths and without re-downloading everything
= fetcher.reprocess_dataframes({
dfs 'Patient': [
"id", "identifier[0].value"),
("gender", "gender"),
("date_of_birth", "birthDate"),
("marital_status", "maritalStatus.coding.first().code")
(
],'Observation': [
"id", "id"),
("patient", "subject.reference"),
("type", "code.coding.first().code"),
("type_display", "code.coding.first().display"),
("code", "valueCodeableConcept.coding.first().code"),
("code_display", "valueCodeableConcept.coding.first().display"),
(
]
})
with pd.option_context('display.max_rows', 100, 'display.min_rows', 100):
'Observation'][dfs['Observation']['type'] == '72166-2']) display(dfs[
id | patient | type | type_display | code | code_display | |
---|---|---|---|---|---|---|
15 | 51202a40-c62e-4727-bfef-b643869d0951 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
32 | 04cc91b0-4fd0-4244-9738-32c563116308 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
53 | 87cc17d0-3914-4e95-9344-b76e394dfae6 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
81 | b10d3da1-249e-406d-90cb-9ac19314f823 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
98 | a0e5b7d8-30a2-4952-ae7a-c0a8d49bc9f3 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
119 | f843b2e7-6caf-4284-aa90-51c2d55c3c51 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
136 | d070c5c5-2a0a-4fe9-ba57-0b7fde240f3b | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
153 | 64beef69-dd3c-46a1-b970-8a5dc95be927 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
185 | b120c744-e1ec-49b9-b8b0-2a54455f410c | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
202 | 984b970a-b027-4778-a820-7f6c56761125 | Patient/fbfec681-d357-4b28-b1d2-5db6434c7846 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
213 | 17378ae3-f3b7-418b-b8e1-7af2856a5eac | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
221 | ec100afb-6dad-47e9-8a58-2ea28c1a5fb6 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
233 | bdf0d1b0-d405-4dbe-acc3-855d6c38d0d2 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
252 | 8b4c3ebb-0f3b-4a9f-aca6-894df17a652d | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
260 | db972319-bfa5-4d88-a562-3e71e94bafc1 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
273 | 88515b19-a75f-486e-af88-96222bd4ca18 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
281 | 2da712dc-da92-4f89-a36e-3aa8fc714d73 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
289 | 219df89c-b801-4d3d-bfc0-7b270ad22542 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
312 | b99b0288-8c94-4126-a7e0-d1c8717e51a3 | Patient/0b8a6ef0-07c8-48ca-804d-1e64f6e44b95 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
332 | 18e82af6-9bb3-4657-8ee6-b02bdb288476 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
341 | 45e97461-0051-4bf5-82df-a8c4a429f099 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
350 | e84acd03-8ae3-4b82-95cf-25799c7c94b8 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
359 | f3d3151e-9a5b-4227-9b28-89f9cb61ca90 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
368 | c3364946-7957-4a88-8912-7dc8a9a1cfcd | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
377 | cb400c7f-485e-4674-a3d8-4700543235ae | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
398 | 450e1f51-8765-431c-a795-40206ac5c9fa | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
425 | 4ab3823f-27c4-466c-ae57-16e914454522 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
434 | 23c9d6e1-71bf-4766-96df-65a3b0d25d05 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
445 | 187e50b4-ac94-48a1-b052-02b5549be600 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
456 | a0e8bef3-8317-4930-858d-8dff3830ff13 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
468 | 30c32d11-4e9a-4feb-b352-2196792bb2cf | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
479 | 0145b24c-12c5-4ef2-a437-3efa9006f498 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
488 | 2b444657-c019-4434-a9d4-3e0f2f7cdd2f | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
509 | 73b8dc8c-91f8-41dd-a6ba-981c6ed12316 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
518 | c83226df-6f61-4849-bcce-1eec0f56be74 | Patient/62e03ae7-079c-4eda-9b5a-29440d3a015a | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
534 | 683a4889-8ee5-4f5b-befb-35ec22b13107 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
562 | 2735a667-d2cd-4491-9a32-0758ce0f48e9 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
583 | f86585e1-ec12-4125-a254-2533ea3b9bba | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
600 | 495b7f39-53b4-4d98-884d-fe14e464a13a | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
637 | 12ea697f-c2b3-4539-80c7-2ed09c84da2b | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
658 | 3a4f53d6-3c82-4efb-b891-d6a866949e58 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
675 | fb5ed249-16e5-4379-85e7-6b8300ee9548 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
723 | ee6a13b1-b6d1-49ab-8eff-128dc7f40daf | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
760 | ec72955c-b5ba-40b8-a67e-13b137d95cb7 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
822 | 71d62e29-789d-4417-95fa-b71292d2e685 | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
859 | 6d191cde-5731-41e5-9197-5827fad6608c | Patient/7cdaae04-4ce7-4a6d-b6a3-5cddba9bc888 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
871 | 566c6a71-d7ee-49fa-88a4-2c5e6e1d1476 | Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
880 | 55ffbd45-6042-4240-8f32-4cd10c1c6d48 | Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
900 | abd678ea-30d0-48b1-9f8e-4b96c952e55e | Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
910 | c00b5394-c828-46d3-b81f-498b768c5c81 | Patient/84d4bafe-1891-4e6c-b8aa-38d2eafd8193 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
... | ... | ... | ... | ... | ... | ... |
3094 | 1448ef79-0efb-49fd-a32e-8e4501d63654 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3103 | 70b64209-a5f0-4d9b-9b92-ca8d4e37ad20 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3128 | 518b9a78-3ed3-401e-8792-b1f5653da244 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3137 | 4a491920-5523-4e12-a178-ad7692aae555 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3146 | eec65c1b-e59a-4e65-a311-2c0cd097b078 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3166 | 5559cf2a-1386-4a48-b889-29798870a3f4 | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3190 | 62abe2b1-4b46-41e4-9784-05509fe2439b | Patient/7ba8d35f-3f70-48b9-b711-104374136ac7 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3199 | 9cc9a329-78d7-4b4c-9734-624576028f5f | Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3208 | aab937d5-dc7c-44f2-9328-c58039baeedb | Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3228 | c8321dcc-cba2-454e-a862-b692bf75455e | Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3236 | df310c44-a40d-4596-a16d-22ae9454e2d7 | Patient/ff9d23d8-f3c8-4eee-a5f9-e05e843675b5 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3245 | ebb48186-7b0c-4ea2-a4ee-1bc77bd40926 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3253 | 837fd4cb-10c3-4d89-b2c1-23be8a8572b6 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3262 | e9eabd4c-3d45-45e8-b21e-0a479cb0530f | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3282 | 6a594967-b815-4650-82f6-eb404bfc7aaf | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3299 | fe884b8b-04d9-4c3b-a710-f77d89d87bef | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3317 | 22d56cc5-7a0e-49cf-8483-97de08415e1c | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3334 | 6bcef563-1082-4398-85d1-86a8f6603141 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3351 | 6d9a0171-6e80-464f-8022-5b740af8a932 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3368 | e5447fff-dee3-4e8a-9756-e11304fa7f82 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3396 | df8ba6a0-a4e8-4983-b1fb-85a6e68d5d39 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3424 | 1c7ca98b-18a6-401a-b5f9-218d21455658 | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3441 | 71029c58-cca3-45d6-8f2e-cfa1b225883c | Patient/8d3e1155-278a-4824-a7e0-fddb24c7c179 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3458 | 756e618d-9983-4da5-9023-6103c1d4f3c0 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3479 | f70de159-6f5c-40fb-a3fd-90c722a51aa6 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3496 | 4ea513ae-7aaf-4e61-bfa1-4e82333e6788 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3524 | 21baef36-0caf-452e-a049-d624974a0739 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3545 | 74b3fc01-da0c-4043-b630-38bb679fbd42 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3562 | 2f7f598b-fe86-44c4-88c1-f4951b5f9560 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3580 | 938dcefa-4ffb-4757-b13a-7cd715bcd599 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3598 | 11f8c8fc-ecee-4ec3-9201-8c192f3f8449 | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3619 | 5d0dabae-f225-4ac7-a9e5-ae1103720cde | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3647 | 437d29cc-02b8-48a3-bc00-4e895bcdddae | Patient/8c9fea57-6ded-47b0-88c9-75518430b572 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3667 | 0d4d8892-536d-4e3d-adea-690b9f6582ed | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3675 | a563d054-0081-4432-8977-a5cde30e13de | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3687 | 0f7039d9-1ef2-4aac-8617-96e33f827a10 | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3695 | 6e0600dc-f43a-48f8-87f6-45a0983c6d97 | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3703 | be1d1942-ae59-42db-914a-04cbc61cdcf3 | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3726 | 82de3db7-2921-453d-bbc5-8c56d70d630f | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3734 | 07f06643-6409-404c-89f6-fe1eb0e8c8cc | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3742 | 5ec5a8fd-43e8-4835-bef4-c0a429745a59 | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3754 | 2151a281-9eac-4f98-9a1d-3b30b169e2f2 | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3762 | 889b8074-f24d-4b7d-aba3-474fed898bda | Patient/d2524ab6-4db9-440d-b588-6dcfcab89270 | 72166-2 | Tobacco smoking status NHIS | 266919005 | Never smoker |
3801 | f596e621-6fe7-4acf-a0a7-2a68a2a77b35 | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3833 | 3e0ed4d8-7e00-42e0-8a7c-d9d6dd57e05d | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3881 | 4a32ee10-0c5c-4f1c-820a-9155ee04d060 | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3909 | 8e5f48c0-651c-4f02-9a75-8268fa1c0b1b | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3952 | 3f033821-1193-4ecf-95eb-9cfc2fa21cc0 | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
3980 | 5368a223-6972-410f-a253-6be399479303 | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
4008 | 6d9ff554-d1cf-48bb-9487-7d132c3575af | Patient/f98b23bf-4443-46d0-9eaf-563e767cf948 | 72166-2 | Tobacco smoking status NHIS | 8517006 | Former smoker |
258 rows × 6 columns
Creating FHIRPaths
It may be helpful to use an online tool like https://hl7.github.io/fhirpath.js/ to assist with creating FHIRPaths for filtering the FHIR resources down for creating DataFrames. (Note that you should not use tools like this with identified patient data.)
We have a convenience method to get an example resource in JSON format from the fetcher
object:
print(json.dumps(fetcher.get_example_resource('Observation'), indent=4))
{ "resourceType": "Observation", "id": "7a10d1ef-97f2-468b-b3ab-78dc99c65cf6", "status": "final", "category": [ { "coding": [ { "system": "http://terminology.hl7.org/CodeSystem/observation-category", "code": "vital-signs", "display": "vital-signs" } ] } ], "code": { "coding": [ { "system": "http://loinc.org", "code": "8302-2", "display": "Body Height" } ], "text": "Body Height" }, "subject": { "reference": "Patient/fbfec681-d357-4b28-b1d2-5db6434c7846" }, "encounter": { "reference": "Encounter/a65fee02-b183-4ae5-a9e2-5edf89a6f327" }, "effectiveDateTime": "2010-11-20T02:52:44-05:00", "issued": "2010-11-20T02:52:44.074-05:00", "valueQuantity": { "value": 162.4, "unit": "cm", "system": "http://unitsofmeasure.org", "code": "cm" } }
This can be copied and pasted into https://hl7.github.io/fhirpath.js/ to experiment with FHIRPaths. Note that the JavaScript library used on this testing website is not the same as the Python library used in this notebook, so there may be some implementation differences.
Testing with Synthea data
Having test data is very helpful when developing code that uses FHIR Bulk Data. The test data from https://bulk-data.smarthealthit.org may not have all the data elements you need for a specific research use case. Synthea can be used for generating customized synthetic data in FHIR format. Below we’ll look at how to load .ndjson
from Synthea into this notebook and use reprocess_dataframes()
with FHIRPaths to convert into Pandas DataFrames.
First, we’ll create a short class to mimic the functionality of BulkDataFetcher
but with loading the .ndjson
directly from disk rather than via a bulk data export.
class SyntheaDataFetcher:
def __init__(self, ndjson_file_path):
self.resources_by_type = {}
= sum(1 for line in open(ndjson_file_path,'r'))
num_lines with open(ndjson_file_path, 'r') as file:
for line in tqdm(file, total=num_lines):
= json.loads(line)
json_obj = json_obj['resourceType']
this_resource_type if this_resource_type not in self.resources_by_type:
self.resources_by_type[this_resource_type] = []
self.resources_by_type[this_resource_type].append(json_obj)
print("Resources available: ")
print('\n'.join(['- '+ x for x in self.resources_by_type.keys()]))
def get_example_resource(self, resource_type: str, resource_id: Optional[str] = None):
if self.resources_by_type is None:
print("You need to run get_dataframes() first")
return None
if resource_type not in self.resources_by_type:
print(f"{resource_type} not available. Try one of these: {', '.join(self.resources_by_type.keys())}")
return None
if resource_id is None:
return self.resources_by_type[resource_type][0]
= [r for r in self.resources_by_type[resource_type] if r['id'] == resource_id]
resource
if len(resource) > 0:
return resource[0]
print(f"No {resource_type} with id={resource_id} was found.")
return None
def reprocess_dataframes(self, user_fhir_paths):
return BulkDataFetcher._reprocess_dataframes(self.resources_by_type, user_fhir_paths)
# Load in 40 patients of Synthea data.
# The original data come from <https://synthea.mitre.org/downloads> > 1K Sample Synthetic Patient Records, FHIR R4
= SyntheaDataFetcher('synthea_100.ndjson') synthea_fetcher
Resources available:
- Patient - Organization - Practitioner - Encounter - Condition - Device - Claim - ExplanationOfBenefit - CareTeam - Goal - CarePlan - Observation - Immunization - DiagnosticReport - Procedure - MedicationRequest - ImagingStudy - AllergyIntolerance - MedicationAdministration
Here is how to apply FHIRPaths to filter the Synthea data:
= synthea_fetcher.reprocess_dataframes({'Patient': [('id', 'id')]})
dfs
'Patient'] dfs[
id | |
---|---|
0 | 5cbc121b-cd71-4428-b8b7-31e53eba8184 |
1 | adccf2c3-9dc4-4067-ba23-98982c4875da |
2 | 31191928-6acb-4d73-931c-e601cc3a13fa |
3 | 67816396-e325-496d-a6ec-c047756b7ce4 |
4 | b426b062-8273-4b93-a907-de3176c0567d |
... | ... |
95 | ae4c5b55-c704-4406-b353-285f9166a489 |
96 | edb1ebc5-d629-4c43-acf5-b8d1c38d9bd2 |
97 | 2d75e3a4-f0f6-45dd-8b57-75fb2f303c9e |
98 | ea95f498-7929-4d50-be55-9bf7baee3a8d |
99 | 57ca2c16-7008-41e5-b338-4758b2fc46f0 |
100 rows × 1 columns
You can also get a sample resource to look at the raw JSON:
print(synthea_fetcher.get_example_resource('Patient'))
{ 'resourceType': 'Patient', 'id': '5cbc121b-cd71-4428-b8b7-31e53eba8184', 'text': { 'status': 'generated', 'div': '<div xmlns="http://www.w3.org/1999/xhtml">Generated by <a href="https://github.com/synthetichealth/synthea">Synthea</a>.Version identifier: v2.4.0-404-ge7ce2295\n . Person seed: 6457100290386878904 Population seed: 0</div>' }, 'extension': [ { 'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race', 'extension': [ { 'url': 'ombCategory', 'valueCoding': { 'system': 'urn:oid:2.16.840.1.113883.6.238', 'code': '2106-3', 'display': 'White' } }, {'url': 'text', 'valueString': 'White'} ] }, { 'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity', 'extension': [ { 'url': 'ombCategory', 'valueCoding': { 'system': 'urn:oid:2.16.840.1.113883.6.238', 'code': '2186-5', 'display': 'Not Hispanic or Latino' } }, {'url': 'text', 'valueString': 'Not Hispanic or Latino'} ] }, { 'url': 'http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName', 'valueString': 'Deadra347 Borer986' }, {'url': 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex', 'valueCode': 'M'}, { 'url': 'http://hl7.org/fhir/StructureDefinition/patient-birthPlace', 'valueAddress': {'city': 'Billerica', 'state': 'Massachusetts', 'country': 'US'} }, { 'url': 'http://synthetichealth.github.io/synthea/disability-adjusted-life-years', 'valueDecimal': 14.062655945052095 }, { 'url': 'http://synthetichealth.github.io/synthea/quality-adjusted-life-years', 'valueDecimal': 58.93734405494791 } ], 'identifier': [ {'system': 'https://github.com/synthetichealth/synthea', 'value': '2fa15bc7-8866-461a-9000-f739e425860a'}, { 'type': { 'coding': [ { 'system': 'http://terminology.hl7.org/CodeSystem/v2-0203', 'code': 'MR', 'display': 'Medical Record Number' } ], 'text': 'Medical Record Number' }, 'system': 'http://hospital.smarthealthit.org', 'value': '2fa15bc7-8866-461a-9000-f739e425860a' }, { 'type': { 'coding': [ { 'system': 'http://terminology.hl7.org/CodeSystem/v2-0203', 'code': 'SS', 'display': 'Social Security Number' } ], 'text': 'Social Security Number' }, 'system': 'http://hl7.org/fhir/sid/us-ssn', 'value': '999-93-7537' }, { 'type': { 'coding': [ { 'system': 'http://terminology.hl7.org/CodeSystem/v2-0203', 'code': 'DL', 'display': "Driver's License" } ], 'text': "Driver's License" }, 'system': 'urn:oid:2.16.840.1.113883.4.3.25', 'value': 'S99948707' }, { 'type': { 'coding': [ { 'system': 'http://terminology.hl7.org/CodeSystem/v2-0203', 'code': 'PPN', 'display': 'Passport Number' } ], 'text': 'Passport Number' }, 'system': 'http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber', 'value': 'X14078167X' } ], 'name': [{'use': 'official', 'family': 'Brekke496', 'given': ['Aaron697'], 'prefix': ['Mr.']}], 'telecom': [{'system': 'phone', 'value': '555-677-3119', 'use': 'home'}], 'gender': 'male', 'birthDate': '1945-12-10', 'address': [ { 'extension': [ { 'url': 'http://hl7.org/fhir/StructureDefinition/geolocation', 'extension': [ {'url': 'latitude', 'valueDecimal': 41.93879298871088}, {'url': 'longitude', 'valueDecimal': -71.06682353144593} ] } ], 'line': ['894 Brakus Bypass'], 'city': 'Taunton', 'state': 'Massachusetts', 'postalCode': '02718', 'country': 'US' } ], 'maritalStatus': { 'coding': [ {'system': 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus', 'code': 'S', 'display': 'S'} ], 'text': 'S' }, 'multipleBirthBoolean': False, 'communication': [ { 'language': { 'coding': [{'system': 'urn:ietf:bcp:47', 'code': 'en-US', 'display': 'English'}], 'text': 'English' } } ] }
Try it yourself
Using FHIRPath, create the necessary dataframes to answer the following questions:
- How many patients in the dataset have ever received a flu vaccine?
- What are the five most common conditions that patients have been diagnosed with? (Use only the first diagnosis of a given condition for each patient.)
- What is the most common medication (in MedicationRequest), and what are the top 5 encounter types associated with these medications?
Remember that you can look at the FHIR resource documentation to see what data elements are in each resource. You can also use synthea_fetcher.get_example_resource('ResourceTypeHere')
with https://hl7.github.io/fhirpath.js/ for testing out FHIRPaths if needed.
Summary
Through this exercise we built a reusable tool to connect to a FHIR server with Bulk Data capabilities, export a set of resource types, and convert that data into DataFrames for analysis.