Got feedback or spotted a mistake?

Leave a comment at the end of this page or email contact@krishagni.com

De-identification of OpenSpecimen data

Follow the below steps to de-identify the PHI data as well as make the data size smaller by removing audit data:

  • Make a copy of production DB
  • Run below SQLs on the copy
  • Export the data dump and import it into the test server.

Important: Do not run these SQLs on the production database.

SQLs: 

 Click here to expand...

update catissue_participant set LAST_NAME=null, FIRST_NAME=null, MIDDLE_NAME=null, BIRTH_DATE=null, SOCIAL_SECURITY_NUMBER=null, DEATH_DATE=null, EMPI_ID=null; (need to add email and phone number columns here)

update catissue_specimen_coll_group set SURGICAL_PATHOLOGY_NUMBER=null;

TRUNCATE table catissue_part_medical_id;

TRUNCATE TABLE catissue_part_medical_id_aud;

TRUNCATE TABLE catissue_participant_aud;

TRUNCATE TABLE cat_specimen_coll_group_aud;

<truncate consent tables>

Got feedback or spotted a mistake?

Leave a comment at the end of this page or email contact@krishagni.com