Tune memory sampling

assigned to @kali

@kali, can you comment on what kind of documentation you expected to see, so I can adapt?

Also, I have another branch in which i can tune the memory sampling period in a per-test basis. On the other hand, this didn't seem necessary for now, as a sampling period of 100 seems to account for all tests needs (check this page. Do you have some thoughts about this?

@drebs I think I would expect, basically, a page explaining how the sampling interval is chosen. the wiki page seems quite informative. seems the threshold has been 2 sec to switch to sample every sec instead of every 0.1 sec? (that would be the cutoff to have the right nyquist rate if I'm correct). do you know how are the rounds chosen? I'm guessing that's a choice by pytest-benchmarks.

side thoughts

does cpu sampling follows a similar logic?

on the other hand, looking at that table I wonder if it's meaningful to have memory resources plotted for all of the tests. it's also quite curious that that cpu intensive test shows more memory consumption than the mem one! (is that right?).

it also would help to have a separate graph for memory, since sharing the y-axis with cpu measurements looks quite confusing in the current graphs.

feel free to move the discussion of the latter points to any other relevant issue.

units

looking at that table, I see one data point that is on the verge of the nyquist rate (sampling at 0.1 for an event that takes 0.2s), and several instances that take > 30 sec. I wonder if those times can be cut by lowering the number of documents. I think we would benefit also from bigger clarity comparing the normalized measurement, ie, returning also a rate of time or memory per document (or per kb) - storing the total value might be good too.

@kali, i answer point by point.

Memory sampling interval: the memory sampling interval must be chosen according to the time length of each test. If a test is too quick (i.e. takes 1 second) and the memory sampling interval is too big (i.e. also 1 second) the sampling will not capture the evolution of memory usage during the test. What I am proposing is to standardize the sampling on 0.1 second instead of tuning it in a per test basis, as 0.1 second seems to cover all scenarios. The "maximum sampling interval" column in the wiki page is something i used during analysis to assess, for each test, what would be a sampling interval that would be enough to see what's going on with memory during test execution. After I came up with those, i thought that the simplest would be to standardize everything to 0.1 second instead of parametrizing it individually, for now. That is the current proposal, and i am open to modifying it if we think it is better to parametrize individually for each test. I currently don't see the need for that if we have one parameter that covers all.

Choice of number of rounds: I tried to convey the idea in the proposed docs (whichi i might have to review if it is not clear enough) that the rounds can either be chosen automatically or manually. We currently have a mix: the number of rounds is chosen automatically for all tests but the ones in the test_sync module. The automatic choice takes into account a lot of parameters (as described in the proposed doc), and the explicit choice is currently made for longer sync tests, that last several seconds. We can either stick with the current way, or fix rounds for every test, or let pytest-benchmark choose for all tests.

CPU sampling: there's no explicit CPU sampling, because the cpu_percent() works differently from memory_percent(). This is also noted on the proposed doc, i might have to make it more explicit if it is not clear, also. CPU percentage is measured for an interval of time (i.e. one call of the method will return the cpu used since the last call), while memory percentage returns the current memory usage. This is why we can rely on the cpu percent method to give a good account of percentage of cpu time used by the process, while we need to explicitelly sample during execution to know stats for memory, or even max memory usage during execution.

Relevance of plotting memory: I think we should now plot for all tests and decide later based on our observations. Where are you seeing the plot for cpu/memory intensive tests? It is possible that the sampling interval was not enough to account for the real differences.

Separate graphs for memory and CPU: I don't see a problem of separating, as well as maintaining together if the Y axis will always be showing the interval 0-100 (percentage).

Units: i will double check the length of the fastest test (raw enc of 10k doc) and check if the sampling interval is enough or not for that one. Regarding total time of longer test (i.e. the ones that take > 30s), we have to think if it is more useful to have faster result (and so shorten the time of the test) or if it makes sense to maintain a longer test to have a more real account of what is going on. I agree with comparison of normalized, but i think this is a further step after we have this settled.

mentioned in issue #8860 (closed)

added with questions label

If a test is too quick (i.e. takes 1 second) and the memory sampling interval is too big (i.e. also 1 second) the sampling will not capture the evolution of memory usage

right, at minimum we have to sample at the nyquist rate, which is 2*f

i thought that the simplest would be to standardize everything to 0.1 second instead of parametrizing it individually, for now. That is the current proposal, and i am open to modifying it if we think it is better to parametrize individually for each test. I currently don't see the need for that if we have one parameter that covers all.

noise and overhead. we'd have to see if/howmuch it influences the metrics.

The automatic choice takes into account a lot of parameters (as described in the proposed doc),

I saw that in the docs, but that does not answer my question :) I guess I can live with whatever magic/heuristics pytest-bench is doing.

Where are you seeing the plot for cpu/memory intensive tests?

you're right, there's no plot. mental fart of mine, in the table there's time and not memory. forget what I said! :)

we have to think if it is more useful to have faster result (and so shorten the time of the test) or if it makes sense to maintain a longer test to have a more real account of what is going on.

basically we have to do an experiment to see how's linearity in there, and if its good we can interpolate.

I'm merging this, but leaving the MR open bc we're having the discussion here. Feel free to close it when you read or reply to the points above.

added Blocked- label

I am closing this one as it has been merged and whatever is left is being tracked in #8860 (closed).

closed

Tune memory sampling

Merge request reports

Activity

side thoughts

units