SQLCipher backend threads may time out when writing to the database
Soledad may fail with @pysqlcipher.dbapi2.OperationalError: database is locked@ because sqlcipher may time out when concurrently writing to the database under heavy load. The reasoning is as follows:
- "SQLite blocks concurrent writes until a certain timeout is reached":https://pysqlite.readthedocs.io/en/latest/sqlite3.html#sqlite3.connect. SQLite's default is 5 seconds, current soledad default is 10 seconds.
- At some point in time Soledad gained an asychronous sqlcipher backend that uses a Twisted's adbapi and a pool of threads for accessing the underlying sqlcipher database.
- Then we started seeing the timeout happening under heavy load (#6585 (closed), #6625 (closed)).
- We decided to increase the timeout and retry for a number of times before giving up.
- The choice of values for timeout and number of retries was so that we wouldn't see those errors anymore in our development environments.
The problem stopped showing up until now:
- We have recently configured automated tests running using gitlab ci and docker.
- The "database is locked" error "is now hunting us again":https://0xacab.org/leap/soledad/builds/222, and sometimes preventing us from having successful builds. Note that the load should not be so heavy here, as that test actually does 2 syncs of 20 empty documents each.
Some further investigation is needed:
- How much time is this test actually taking? Run a profile of that test.
- Why does the timeout happen if the test is not actually a heavy load?
Some options:
- Increase timeout and number of retries. If we choose this, soledad will continue crashing instead of hanging on deadlocks or when waiting for long operations.
- Do not timeout ("disable busy handler":https://www.sqlite.org/c3ref/busy_timeout.html by "passing a timeout equal to 0":https://pysqlite.readthedocs.io/en/latest/sqlite3.html#sqlite3.connect). Does that behaviour make sense?
- Fix whatever other problem might be happening there.
(from redmine: created on 2016-06-22, closed on 2016-07-26)