Use prometheus on the same box as prod to scrape for basic health metrics. Watch out for diskspace consumption by prometheus and memory consumption by docker. Add a prometheus counter for messages sent on each channel.
Then instrument with grafana.
(notes from chat w/ @aguestuser):
i found a helpful way in reading the section on monitoring in nat's book: https://www.bookdepository.com/Real-World-SRE-Nat-Welch/9781788628884
his example in go, but prometheus has clients in basically every language. (we'd want JS, since you write the instrumentation code inline with your application code). it was roughly 20-30 lines of instrumentation code to get basic monitoring of memory/disk usage, request rates, error rates, etc.
then you can make custom "counter" instrumentations (which are cached in memory and scraped every SCRAPE_INTERVAL by prometheus. the only custom one i think we'd want is a counter incremented whenever a message was sent on a channel. so we can produce a graph of messages sent on each channel over time, (and group the busiest channels into their own box if needed down the line)
Check for blog post on all this coming from Nat Welsh soon! :)