Auto bulk import

Introduction

OpenSpecimen v3.2 supports a new feature to automatically monitor a folder to consume bulk import files on a continuous basis. This feature is very useful to integrate OpenSpecimen with other systems like REDCap, OpenClinica etc.

With this feature, the external sources do not need to learn OS REST APIs to integrate with OS. They just need to generate OS BO compatible CSV files and store it in a pre-defined folder. OS continuously monitors this folder and consumes the BO file when it finds a new file. The CSV files are processed in ascending order of timestamp specified in the file name. This helps maintain the order of the files imported.

Steps

  1. Create a "scheduled-bulk-import" directory in-app data (os-data) directory if not present. The location of the os-data directory can be retrieved from $Tomcat_Home/conf/openspecimen.properties file. Check for app.data_dir property.
  2. Copy the bulk import CSV files in 'scheduled-bulk-import' directory. You can do it via scripts in case of integrations.
  3. The file name should be in a format as mentioned in the table below

    DataFormatDescription
    Standard entities (e.g. participant, specimen, etc) <object_type><operation><timestamp>_[<csv_type>].csv
    1. object_type: Entity name specified in bulk import schema file (see list below)
    2. operation: Operation to perform, valid values are "create" or "update".
    3. timestamp: In yyyyMMddHHmmss:SSS".
    4. csv_type (optional): Specify "m" in case of "Order" and "Shipment"
    5. Examples
      • cp_create_20160511162033124.csv
      • distributionOrder_create_20160511162033124_m.csv
    Custom fields<entity>_<operation>_<timestamp>_cpId_<cpId>.csv
    1. entity: cpr, visit, specimen.
    2. operation - Operation to be performed, "create" or "update"
    3. timestamp - In "yyyyMMddHHmmssSSS" format
    4. cpId - Specifying Identifier of collection protocol. 
    5. <cpId> - Identifier of collection protocol. You can get this identifier via DB or from the browser URL on the collection protocol overview page.
    6. Example: Specimen custom field level update file

      specimen_update_20200925115950000_cpId_4292.csv

    Custom formsextensions_<attached_level><form_name><operation>_<timestamp>.csv
    1. extensions: Static word to identify a custom form.
    2. attached_level: Level at which form is attached. 
      1. Participant
      2. SpecimenCollectionGroup (i.e Visit)
      3. Specimen
      4. SpecimenEvent
    3. form_name - System generated 'Form Name' of the custom form.
    4. operation - Operation to be performed, "create" or "update"
    5. timestamp - In "yyyyMMddHHmmssSSS" format
    6. Example: extensions_Participant_familyHistoryAnnotation_create_20160511162246252.csv
  4. Once the files are processed (uploaded into OpenSpecimen), they are moved to a folder named - 'processed-bulk-import'
  5. If there are any issues with the file name format, then the file is not uploaded and moved to a folder name 'unprocessed-bulk-import'
  6. The file name needs to be updated and the file must be moved to the 'scheduled-bulk-import' from where it can be taken up for upload.
  7. To view the bulk import jobs, use the URL <https://IP address/openspecimen/#/bulk-import-jobs>.
  8. Following is the list of object types for OpenSpecimen entities

Note

Update CP based custom field values of participants, visits, and specimens are supported from v.6.3

EntityObject Type
Institutesinstitute
Sitesite
Useruser
User RolesuserRoles

Container

storageContainer
Distribution OrderdistributionOrder
Shipmentshipment
Participant registrationcpr
Participant registrations for multiple CPcprMultiple
Participant consentsconsent
Visitsvisit
Specimenspecimen
AliquotsspecimenAliquot
DerivativesspecimenDerivative
Master SpecimenmasterSpecimen

Note: The date format in the files should be in the format <mm-dd-yyyy>for US and for others <dd-mm-yyyy>.