EPIC: automated backup/restore of db and keystores
Value
- be able to recover the service if it is completely borked or a server is lost/damanged
- maintain confidentiality of user lists by not keeping incremental backups. if admin deletes a channel, it will no longer be backed up within a day
Behavior
(aka: "definitions of done"):
automated backup
-
every night run a cron job runs that:
-
makes a copy of the signal_data
volume andpg_dumps
the db -
writes encrypted versions of both to the filesystem (encrypted to 2 maintainer keys) -
scp
s those files to a backup server -
destroys old backups on the backup server
-
-
implementation note: all files necessary to do this (cron tab file, scripts, ssh keys) should live in the repo (so a user logged into prod can set things up by pulling repo and cd'ing into correct dir)
automated restore
- there is an ansible script that can be run with
make ansible.restore
that:- reads an
sb_backup
host from the inventory - runs an scp script that pulls backups from
sb_backup
to (new)signalboost
host using-i /home/sb_user/.ssh/id_sb_user
(which assumes that pubkey & secrete key must be onsignalboost
and that pubkey must be inallowed_keys
onsb_backup
) - runs a restore job on
signalboost
(which must run afterprovision
anddeploy
) that restores the keystore volume and runspg_restore
on the db backup - optionally: deletes signalboost files from any borked machine (?)
- reads an
ansible requirements
- some way of putting inventory (hence what backup and prod hosts and what users and keys are) under version control, without blasting away the
ansible_user
etc. values that any given dev might be using. (maybe putinventory.tmpl.gpg
under version control? - pub key for
sb_user
must go onsb_backup
host -
sb_user
and its pub/priv ssh keys must go on all prod instances - gpg keys should be imported into keyring as part of
provision
- cron job for running
backup
script must be put into allprod
instances as part ofprovision.yml
Edited by aguestuser