Increase Prometheus data retention defaults
Because lilypad was initially created with float's create-env
invocation, which assumes you are making a test environment, the default prometheus data retention level is set to only 2 days. This is obviously too short for any production deployment. Its difficult to determine in advance what a good value should be, but we would only be setting a default here, that could be changed by anyone.
What I do know is that a/i production instances with 6 months of data is 12GB, but this is with a multi-tier system, so that LTS version of the data is not high-cardinal, high-frequency data. With one month of high-cardinal/frequency data, they were seeing 150GB.
For their high-frequency data, they keep 2 days worth, which is <10gb of space.
So for 6months+2days, it costs 22gb of storage for logs.
My personal opinion is that one year of overview data is useful, to capture trends over time, but being able to "zoom-in" to see what happened on a granular level doesn't need to be done over the entire year. When you want to 'zoom-in' its more to figure out what happened in the last few days when something strange occurred. In an ideal world, I think I'd like to be able to do zoomed-in introspection over the last week. I think I wouldn't want to go lower than 2 days.
If we were to make a wild estimate, based on the data storage that a/i sees for their 6 months+2days, and extrapolate that to 1 year+7days, we would be looking at 24GB+(3.5*10)=59GB of logs (if all things are equal, and we round-up).
So the question is: is 60gb for logs too much of a requirement? If we are ok with the minimum 6months+2days (22gb), but not ok with 60gb, then where is the line?