FHIR Bulk Data Workshop Setup
This workshop uses a JupyterHub server, which can be installed via these instructions for Google Cloud Platform (GCP).
Capacity planning
JupyterHub has a capacity planning page that may be helpful when deciding on the specifications for provisioning a server.
Manual setup on an existing server
Install JupyterHub
Following the relevant portion of these instructions, run the following after SSHing into the Ubuntu VM:
sudo apt install python3 python3-dev git curl
curl -L https://tljh.jupyter.org/bootstrap.py | sudo -E python3 - --admin jupyterhub --show-progress-page --plugin git+https://github.com/kafonek/tljh-shared-directory
The --show-progress-page
option will show a status page at http://<public ip address here>/index.html
while the installer is running.
The setup should be automatic after running this command. When completed, navigating to http://<public ip address here>
in your web browser should show JupyterHub. You log in with username jupyterhub
and whatever password you enter in the login box will be set for the jupyterhub
account automatically.
Creating users for the workshop
Can batch create, password will be set the first time they log in
Don’t create extra admins – they have root access to the server.
Resetting passwords
Delete the user through the JupyterHub admin console. This will not delete their home folder on the server where their files are stored.
Deleting users
In addition to deleting a user through the JupyterHub admin console, you will also need to delete their Linux account:
sudo userdel jupyter-usernamegoeshere
Getting the Jupyter notebook and installing dependencies
Create a new Terminal window in JupyterHub while logged in as an admin, and run the following command:
git clone https://github.com/mitre/fhir-for-research.git
sudo -E pip install -r /path/to/fhir-for-research/requirements.txt
The requirements.txt
file you need is here if you can’t easily access your clone of the repo.
Getting the notebook in every user’s home folder
There are two options:
- Copy the notebook to
/etc/skel
, and JupyterHub will automatically copy it to the home folder for every new user. This is easier for the users, but there is no good way to update notebook contents once a user account has been created. - Put it in the
/srv/workshop
folder, and have users open it via JupyterHub, “save as”, and then refresh the page. This is more cumbersome for the users, but ensures everyone will have an up-to-date notebook to work off of.
Avoiding time-outs
See Culling idle notebook servers — The Littlest JupyterHub documentation:
sudo tljh-config set services.cull.timeout 18000 # 5 hours
sudo tljh-config reload
Converting JWK to PEM
sudo apt install nodejs npm
npm install node-jose
Create convert.js
with the following contents:
const jose = require("node-jose");
const jwks = {}; // Set this to the full JWKS from https://bulk-data.smarthealthit.org
// Get first key where the `key_ops` value is `sign` -- this is the private key
const key = jwks["keys"].filter((k) => k.key_ops[0] == "sign")[0];
// Convert private key to PEM format and print to terminal
.JWK.asKey(key).then(function (key) {
joseconsole.log(key.toPEM(true));
; })
Now you can run node convert.js
to get the private key in PEM format.
Creating synthea.ndjson
- Download the 1k FHIR R4 sample dataset from here
- Unzip and delete down to the number of patients you want (recommended: 50 or 100)
- Run
jq -c '.entry[].resource' *.json > synthea.ndjson
(this requiresjq
)
Place the .ndjson
file(s) in /etc/skel
on the server to add them automatically to each new user’s home folder.
Enable TOC in Jupyter
From the Terminal window via a JupyterHub admin account:
sudo -E pip install jupyter_contrib_nbextensions
sudo -E jupyter contrib nbextension install --sys-prefix
sudo -E jupyter nbextension enable toc2/main --sys-prefix
This is based on this documentation