Automate bigcouch rebalancing
The bottom line is, whenever we have all bigcouch nodes existing from the beginning, everything is fine. If we later add or remove nodes, we're in a situation where we manually need to fix the location of the database shards in every (!) database, either to have data synced on a new node, or to take out shard locations from old nodes.
This is also why ant (added later) is so slow - no data will get written locally (for existing databases), and it pulls the data from across the ocean. see also https://github.com/cloudant/bigcouch/issues/40
Manually fixing/moving the shards seems like a big pita, and i would avoid trying to automate this, it seems like much work and also looking for trouble: http://stackoverflow.com/questions/6676972/moving-a-shard-from-one-bigcouch-server-to-another-for-balancing/695%20%20jerry-je~5900#6955900
The sledgehammer method, after adding/removing a node, could be
- stop db access from webapp
- dump all dbs
- completly purge bigcouch on all hosts (apt-get --purge remove bigcouch)
- reinstall bigcouch on all hosts
- join all nodes to a cluster
- restore dbs
- reenable webapp access
So the short term solution for leap_platform would be:
- put a warning in our docs, that issue this problem, and advice to set up the nodes planned well before going into production.
- manually try once to fix the shard location so we know how it would be done, or use the sledgehammer method mentionen above
- document all our findings how to manual rebalance
- keep fingers crossed that the bigcouch merge happens soon, and that some automation of rebalancing will show up soon
(from redmine: created on 2013-08-16)