We already know how pytest-benchmark decides the number of rounds for each test. The current decision needed is: should we fix ourselves the rounds for every test or should we let pytest-benchmark decide the number of rounds automatically?
I'd propose to separate the e2e-like benchmark tests into a different suite, and keep them with a fixed number of repetitions, but on a different suite.
the number of repetitions would be, ideally, a tradeoff between short total time for the suite run, and low dispersion of results across runs.
@kali, when you say "e2e-like" tests, in the current case, are you talking explicitelly about the ones in the test_sync module?
We currently have more or less 2 sets of tests: those that take around some seconds, and those that take around some tens of seconds, which are from test_sync and test_sqlcipher modules. The ones in test_sqlcipher are not end to end, but take several seconds. Would you fix the number of runs for those as well? Would you maintain them separate from the others?
Please check if this proposal is enough to consider this issue solved:
move test_sync to a new benchmark-e2e suite.
fix the number of rounds for test_sync and test_sqlcipher, because they are longer.
document this choice in the doc.
document the expected number of rounds for automatically calibrated tests.
review documentation to check if everything is clear.
As discussed with @kali in IRC and today's standup, i don't think it is useful or needed right now to separate e2e tests in the benchmark context. I agree that e2e separation is needed, but should be done as a wider test refactor, taking into account the mocking of couch and a careful assessment of which tests should be maintained as e2e in a refactored suite. This is not exactly related to benchmarks, the question here was more about documenting and limiting the length of tests, so I think we currently have enough with the proposed MR.