De-identification of OpenSpecimen data

Follow the below steps to de-identify the PHI data as well as make the data size smaller by removing audit data:

  • Make a copy of production DB
  • Run below SQLs on the copy
  • Export the data dump and import it into the test server.

Important: Do not run these SQLs on the production database.

SQLs: 

 Click here to expand...

update catissue_participant set LAST_NAME=null, FIRST_NAME=null, MIDDLE_NAME=null, BIRTH_DATE=null, SOCIAL_SECURITY_NUMBER=null, DEATH_DATE=null, EMPI_ID=null;

update catissue_part_medical_id set MEDICAL_RECORD_NUMBER=null;

update catissue_specimen_coll_group set SURGICAL_PATHOLOGY_NUMBER=null;

TRUNCATE TABLE catissue_part_medical_id_aud;

TRUNCATE TABLE catissue_participant_aud;

TRUNCATE TABLE cat_specimen_coll_group_aud;