Leave a comment at the end of this page or email contact@krishagni.com
How does database transaction management work?
Some DB gyaan
To understand this, you should know a bit about database (DB) transaction management.
Any operation performed via UI/API results in multiple operations (SQLs) in the DB.
For example, an add participant operation might need 7-8 SQLs. All the SQLs for one operation should be executed within a single transaction; this is called “atomicity” in DB terms. Atomicity ensures that either all of the changes are “committed” or none. Otherwise, the DB would be in a corrupted state.
Now let's understand how OpenSpecimen handles transactions when a big operation is performed.
What is a big transaction?
Importing 1000s of records via CSV
Performing operations on 1000s of records via UI (like moving specimens, editing specimens in bulk, etc).
What are the problems with big transactions?
A big transaction can lead to multiple problems:
Blocking other users from performing operations on the same objects. Because the database is doing some operations, it might decide to “lock the impacted tables” and not allow other users to access them. This will result in “lock” errors.
Slow down the database performance.
How does OpenSpecimen handle big transactions?
UI based operations
Other than CSV import, under most circumstances, one should not face problems in the normal course of usage. However, if you are performing large operations and encounter issues like “lock errors”, it is always better to do it via CSV import instead of the UI.
In general, UI is not meant for doing large bulk operations.
CSV-based import
CSV import is a different beast. It depends on your configuration for the “Pre-validate Records Limit” setting.
Short story (TLDR): If you regularly upload large files, set this to 100 or less.
Long story:
Whatever the value of this setting, OpenSpecimen processes those many records in one DB transaction.
For example:
If set to 10K and you import a file with 5K records, all records are processed in one transaction. The advantage here is that if even one record fails, then none of the records will be committed to the database. Sometimes it is essential that either all records succeed or none.
If set to 100, importing a file with 5K records commits the records to the database every 100 records. However, this also means that if a few records fail, then the rest will be committed to the database.
Here is a table which explains the validation limit setting and transaction size:
Validation Size Setting | Number of records in CSV | Impact |
|---|---|---|
0 or 1 | N/A | 1 record per transaction |
10000 | < 10000 (i.e. less than the setting) | All records in one transaction |
10000 | >10000 (i.e. greater than the setting) | 1 record per transaction |
Leave a comment at the end of this page or email contact@krishagni.com