Introduction (CSV Import)
Got feedback or spotted a mistake?

Leave a comment at the end of this page or email contact@krishagni.com

Introduction (CSV Import)

Introduction

OpenSpecimen supports importing CSV files to bulk upload data. Using this feature, you can add, update or delete data. This can be used for legacy data migration, integration with other instruments or databases, and adding/editing data in bulk.

From V9.1 strict date parsing has been implemented. To use the date formats with single digit date or month formats, the format M/d/yyyy can be used. To create new date formats, refer to the wiki page.

 

Bulk Import Validations

The system validates the CSV for errors such as duplicate values in unique fields, incorrect date formats, incorrect dropdown values etc. Users can choose to validate the file before uploading any record. 

What is 'Validate and Import'?

In bulk upload, if 100 records are uploaded out of which 60 failed and only 40 records processed successfully, the user has to filter out the failed records, rectify and upload them again for reprocessing.

The 'Validate and Import' feature validates the complete file before upload.

  • If any record fails in inputting the CSV file, the whole job will fail and nothing will be saved in the database until all the records get succeeded.

  • If there is any error, then the system returns the status log file with the proper error message for incorrect records so that the user can rectify the incorrect records and upload again.

  • The time required to validate the records is the same as that required to upload the records.

  • The maximum number of records that can be validated in one job is set to 10000 by default. It can be changed from Settings → Common → Pre-validate Records Limit.

  • If the records are more than 10k, the system shows a message 'The number of records to import exceeds 10000, do you want to proceed without validating the input file?'.

  • If you proceed without validation, then the records are processed individually.

Settings of validation before import

All the bulk import jobs are validated first by the system. If there are errors in any of the records, none of the records are inserted or updated. You can download the report, correct the errors, and upload the same report file. 

The validation does not happen in cases of large files. In such cases, data records are processed even if there are error records. By default, it is configured to 10,000, but it can be changed using admin settings:

How to disable pre-validation?

Pre-validation means that OpenSpecimen validates the whole file before attempting to import. This is useful if you don't want any record to be imported even if one record has error.

This setting also impacts the time taken to complete the import, especially in case of large imports (say more than 10K records). It is best to keep this setting to 100 for optimal performance.

To disable pre-validation:

  1. Go to the home page, and click the ‘Settings’ card.

  2. Search for ‘Pre-validate Records Limit’

  3. Set ‘0’ for the ‘New Value’ field and click on ‘Update’

Best practices

  1. When doing a large import, first test with a small subset of 10–100 rows. Many times the same mistake is committed in every row. That avoids waiting for a long time to get the results back with the same error in every row.

  2. For huge uploads, like 500K to 1+M, refer to "Tips and tricks to import many records" below.

Tips and tricks to import many records

If you have a considerable number of data to import (say in 100s of K or millions), you can follow the below steps to improve the speed of data import:

  1. Do imports via folder import and not via the UI. Refer to Auto bulk import for this.

  2. Break the large file into smaller files. Say 100K specimens each. The problem with one large file is that it will take forever for the system to even read the file (i.e. before starting to even process the first row).

  3. If importing via UI, import the file as a Super Admin user. This will tell the system to not spend time doing privilege checks. This will automatically happen if you do the auto-bulk import by dropping the file in the server folder.

  4. Schedule the import during off-peak hours, e.g. daily from 5 PM to 8 AM the next day or weekend. You can do this by putting a fixed number of files in the folder, i.e. once you know 1 file of 100K takes 1 hour, then you can put say 14 files.

Bulk Import – FAQs

Frequently Asked Questions

Got feedback or spotted a mistake?

Leave a comment at the end of this page or email contact@krishagni.com