Integrating with External Data Warehouse

 

Introduction

Many customers want to pull data out of OpenSpecimen into an external data warehouse. You can achieve this in two ways:

  1. Using APIs

  2. Using Query

This page explains the option to use the OpenSpecimen Query interface to achieve this option. This option is easy to achieve.

Importantly, this avoids outside developers learning OpenSpecimen APIs and writing custom programs. Writing programs need a lot of programming efforts, can be a maintenance overhead in the long run, and can be error-prone.

Overview

Exporting data

  1. Create a Query using OpenSpecimen Query module. Reference https://openspecimen.atlassian.net/wiki/spaces/CAT/pages/18972679

  2. Include all the fields needed in the data warehouse. Reference Query Results view.

  3. Save the query

  4. Create a Scheduled Job for the saved query using the Jobs module. Select frequency of job run as nightly, weekly, monthly, etc. Reference https://openspecimen.atlassian.net/wiki/spaces/CAT/pages/356384811

  5. OpenSpecimen creates a CSV file every time the job is run at the scheduled frequency.

Fetching CSV Files Via APIs

Refer to https://openspecimen.atlassian.net/l/c/xi6pnzz0 on how to run the scheduled jobs and download the data output via API.

How to query for records that have been added/updated since the last run?

You may not want all data in every run. To pull only the data that has changed since the last query run, you can use the fields “Update Time” field within Participant, Visit, and Specimen objects in the query UI.

  1. To retrieve specimens modified in the last 60 minutes, use the temporal filter given below:

    1. minutes_between(current_date(), Specimen.updateTime) < 60

  2. To retrieve visits modified in the last 24 hours:

    1. minutes_between(current_date(), SpecimenCollectionGroup.updateTime) < 1440

  3. To retrieve participants modified in the last one week (7 days):

    1. minutes_between(current_date(), Participant.updateTime) < 10080

  4. To retrieve participants modified in the last one month:

    1. months_between(current_date(), Participant.updateTime) < 1