How to add additional data to OpenSpecimen?

You can add additional data from other sources like tumor registry, pathology databases, or other clinical databases into OpenSpecimen. This data will be available in the OpenSpecimen query interface and catalogs.

There are two ways of doing this:

  1. Via custom forms and bulk upload
  2. Via direct database inserts

Via Custom forms and bulk upload

  1. Create a custom form via a form builder that contains the fields you want to include.
  2. Hook with the participant, specimen or visit, and one or more collection protocols.
  3. Go to "Import" in the "Collection protocols" list
  4. Download the BO template for the new custom form.
  5. Use the BO CSV file to import data into OpenSpecimen.
  6. Learn more about bulk import here: Bulk Operations (CSV Import)

Via direct database inserts

High-level steps:

  1. Create a table with required fields and foreign keys to either Participant Registration / Visit / Specimen table. 
  2. Load necessary data into a table using any of the available database tools.
  3. Edit Participant Registration / Visit / Specimen query form to specify metadata of the table and its fields added in step 1.

Detailed Steps with Example:

Goal:

We want to capture vital statistics and radiology diagnoses, if any, for every participant visit. Also, we would like to make this captured data available to researchers via the OpenSpecimen query interface.

Path to Achieve Goal:

  1. Create a table with required fields and foreign key to OpenSpecimen's Visit table as illustrated in below example:

    create table CUSTOM_VISIT_STATS_DIAGNOSIS(
      IDENTIFIER BIGINT NOT NULL AUTO_INCREMENT,
      VISIT_ID BIGINT NOT NULL,
      DIASTOLIC_BP SMALLINT,
      SYSTOLIC_BP SMALLINT,
      RADIOLOGY_DIAGNOSIS VARCHAR(255),
      PRINCIPAL_DIAGNOSIS VARCHAR(255),
      FOREIGN KEY (VISIT_ID) REFERENCES CATISSUE_SPECIMEN_COLL_GROUP(IDENTIFIER)
    );
    
    
  2. Load data into custom visit stats and diagnosis tables with relevant participant visit IDs and field values. Assuming we've all our data in a CSV file visit_diagnosis.csv, we'll use following MySQL command to import data into custom table.

    LOAD DATA INFILE '<my-sql-files>/visit_diagnosis.csv'INTO TABLE
      CUSTOM_VISIT_STAT_DIAGNOSIS
    FIELDS TERMINATED BY ','
    IGNORE 1 LINES;
    
    
  3. Add the following query metadata in the file $TOMCAT_HOME/webapps/openspecimen/WEB-INF/classes/query-forms/scg.xml just below the </row> corresponding to the visit sub-form extensions.

    <row>
      <subForm>
        <name>customVisitDiagnosis</name> <!-- unique name within form -->
        <udn>customVisitDiagnosis</udn>
        <caption>Visit Stats and Diagnosis</caption>
        <table>CUSTOM_VISIT_STAT_DIAGNOSIS</table> <!-- custom table name -->
        <primaryKey>IDENTIFIER</primaryKey>        <!-- this is optional and need not be specified if PK column name is IDENTIFIER -->
        <foreignKey>VISIT_ID</foreignKey>          <!-- visit foreign key in custom table -->
        <parentKey>IDENTIFIER</parentKey>
        <row>
          <numberField>
            <name>id</name>
            <udn>id</udn>
            <caption>Diagnosis ID</caption>
            <column>IDENTIFIER</column>
          </numberField>
          <numberField>
            <name>diastolicBp</name>
            <udn>diastolicBp</udn>
            <caption>Diastolic BP</caption>
            <column>DIASTOLIC_DP</column>
          </numberField>
          <numberField>
            <name>systolicBp</name>
            <udn>systolicBp</udn>
            <caption>Systolic BP</caption>
            <column>SYSTOLIC_BP</column>
          </numberField>
          <dropDown>
            <name>radiologyDiagnosis</name>
            <udn>radiologyDiagnosis</udn>
            <caption>Radiology Diagnosis</caption>
            <column>RADIOLOGY_DIAGNOSIS</column>
            <options>
              <!-- dropdown values are picked from values available in custom table -->
              <sql>select distinct radiology_diagnosis from CUSTOM_VISIT_STAT_DIAGNOSIS where radiology_diagnosis is not null</sql>
            </options>
          </dropDown>
          <dropDown>
            <name>principalDiagnosis</name>
            <udn>principalDiagnosis</udn>
            <caption>Principal Diagnosis</caption>
            <column>PRINCIPAL_DIAGNOSIS</column>
            <options>
              <!-- dropdown values are picked from values available in custom table -->
              <sql>select distinct principal_diagnosis from CUSTOM_VISIT_STAT_DIAGNOSIS where principal_diagnosis is not null</sql>
            </options>
          </dropDown>
        </row>
      </subForm>
    </row>
    
    
  4. Save the file and restart OpenSpecimen

On the successful restart of OpenSpecimen, you should be able to see your custom visit stats and diagnosis fields appearing in the Visit form of query interface. You can use these fields like any normal OpenSpecimen query fields.

Points to Remember:

  1. The query forms are overwritten during an upgrade. Therefore a copy of the customized form XML should be retained for merging after the upgrade is done. This will be enhanced in future versions to make it upgrade proof.
  2. The query form XML files for Participant registration and Specimen are cpr.xml and specimen.xml respectively.
  3. Above example, metadata should take care of 80% use cases. For the syntax of using other field types, refer existing query form XMLs located in $TOMCAT_HOME/webapps/openspecimen/WEB-INF/classes/query-forms directory.
  4. Going forward, in future releases, we plan to allow users to define their custom tables and related query metadata in a plugin, which the query interface can pick automatically and make them available for querying purposes.