Synthetic Data
-
How can you build or test FHIR software without infringing on data privacy and ethics?
Use synthetic data.
-
How can you prepare synthetic data?
Download a prepared database from the links provided, or generate your own database using Synthea.
Synthetic Data describes how to access synthetic FHIR data and run on a test server. Synthetic data is provided by the SyntheaTM project.
Synthetic data should be used instead of real data whenever possible when building and testing FHIR software. Using synthetic data allows researchers to run their software without human subjects privacy considerations, like IRB approval and privacy review.
Synthetic data in FHIR format is typically available via two separate mechanism:
- As downloaded files (usually
.json
format), usually used for loading into software manually for testing or experimentation. - From a FHIR server, usually used for testing software that connects with FHIR API endpoints.
1 General-Purpose Synthetic Data
The Synthea project provides static downloads of synthetic data at https://synthea.mitre.org/downloads.
Synthea data may be used on public FHIR test servers (it can often be identified by the numbers appended to the patient name elements). Note that the Synthea project’s FHIR test server has been retired, so if you want a FHIR test server with a specific Synthea dataset you will need to set this up yourself.
2 Ophthalmology-Specific Synthetic Data With Imaging
A set of Synthea data customized to include synthetic ophthalmology-related data is available. This data may be useful for ophthalmology-specific use cases, or for general purpose testing of FHIR data with embedded images (fundus photos and optical coherence tomography (OCT) foveal B-scans).
2.1 FHIR Format
The synthetic FHIR data are available in the following formats:
- 10 curated full-life records with both ophthalmology and non-ophthalmology conditions (with images | without images)
- 100 records with five years of history of both ophthalmology and non-ophthalmology conditions (with images | without images)
- 1000 records with five years of history and only ophthalmology conditions (with images | without images)
Note: the “with images” links include FHIR resources with embedded images, .jpg
versions of the images alone, and DICOM versions of the images alone.
If you want to run a FHIR test server with these data pre-loaded, please see these instructions using Docker Compose.
2.2 Images Only
Synthetic images alone (without accompanying FHIR data) are also available:
Optical coherence tomography (OCT) foveal B-scans
Two different image-only OCT datasets are available:
- Synthetic OCT (download .zip of 50k images in .png format)
- Note: The synthetic OCT images included in the FHIR data (described above) were drawn from these images.
- Subset of (1) reviewed by ophthalmology experts (download .zip)
These are organized by disease state label:
CNV
= choroidal neovascularizationDME
= diabetic macular edemaDrusen
= drusen related to age-related macular degenerationNormal
= no disease
Fundus photos
Three different image-only fundus photo datasets are available:
- Synthetic fundus photos used in the FHIR data (above) (download .zip of 100 images in .jpg format)
- An additional 900 synthetic fundus photos from the same model used to generate (1) (download .zip of 900 images in .jpg format)
- Synthetic fundus photos reviewed by ophthalmology experts (download .zip)
- Note: The synthetic image generation model was improved after these images were generated. The synthetic fundus photos used for the FHIR data – items (1) and (2) above – are generally higher quality.
3 Using Synthea
If you want to generate Synthea data yourself, please see the following resources:
4 FHIR Test Servers
If you want to run a FHIR testing server with a specific synthetic dataset, please see the following resources: