THORChain Validators — Re-creating From Backups
So, things went completely wrong on an active Validator, but at least we had backups… Now what???
So, things went completely wrong on an active Validator, but at least we have backups… Now what???
If for whatever reason, our active validator completely fails and we need to restore it from backups to save the day, we probably want a straight forward step-by-step to follow. This is a quick and easy way to do it, and chances are our node won’t even be accumulating enough slash to be churned out.
This can be used as is for Maya, replacing thornode by mayanode, but
Note: this could probably also be used to migrate a stand-by node, or even a normal active node, if we shutdown the source node prior to re-creating.

Backups
I’ll start by reviewing what information we should have handy to easily re-create a Validator from its dusts.
mkdir ~/myBackupNode1
// Export the mnemonic (This is the mnemonic that was generated when we originally created the validator.)
NAME=node1 TYPE=validator NET=mainnet make mnemonic > ~/myBackupNode1/mnemonic.txt
// Export the password (This is the password we entered when we originally created the validator, this password is required to restore our backups)
NAME=node1 TYPE=validator NET=mainnet make password > ~/myBackupNode1/password.txt
// Generate a Thornode Backup (This is a backup of thornode we toke after setting it up, it doesn't make a difference if its recent or old.)
NAME=node1 TYPE=validator NET=mainnet SERVICE=thornode make backup
cp ./backups/node1/thornode/2023-XX-XX/thornode-16XXXXXXXX.tar.gz ~/myBackupNode1/
// (Optional) Generate a Bifrost Backup (This is a backup of bifrost, ideally would be taken after the most recent churn.)
// Note: There is a new feature been rolled out that would automatically fetch the latest bifrost key file for us. So we dont need to backup/recover it anhymore.
NAME=node1 TYPE=validator NET=mainnet SERVICE=bifrost make backup
cp ./backups/node1/bifrost/2023-XX-XX/bifrost-16XXXXXXXX.tar.gz ~/myBackupNode1/
// Save ~/MyBackupNode1 somewhere Offline.
// Not on the same SSD
// Not on the same Computer
// NOT ON THE SAME SSD
// NOT ON THE SAME COMPUTERThe content of ~/myBackup/ is all we need to restore our validator from scratch quickly and easily.
Re-creating from backups
Here are the steps to re-create a new THORChain Validator Node from a backup quickly.
// Create the folder and clone git
mkdir node1
cd node1
git clone https://gitlab.com/thorchain/devops/node-launcher.git
cd node-launcher
// (Optional) Edit the chaosnet file (if we want to point to not recreate all daemons)
nano thornode-stack/mainnet.yaml
// (Optional) Edit the gateway service (if we need to customise Metallb IP affinity)
nano gateway/templates/service.yaml
// (Optional) Edit thornode and bifrost (if we need to specify a custom EXTERNAL_IP)
nano bifrost/templates/deployment.yaml
nano thornode/templates/deployment.yaml
// Create the namespace (this allows us to create the secret before running make install
kubectl create namespace node1
// Create password secret (this allows us to restore from our backups files)
kubectl -n node1 create secret generic thornode-password --from-literal=password='X'
// Note: double " would not work with bashif a password contain a !
// Create mnemonic secret (this allows our thornode to re-create the same validator address)
kubectl -n node1 create secret generic thornode-mnemonic --from-literal=mnemonic="X"
// !!IMPORTANT!! Ensure that the old node is completely stopped, having another bifrost or thornode running elsewhere may cause double spend and bond slash!!!!
// (Optional) Scale down the node by executing the following at previous location
kubectl -n node1 scale deployments --replicas=0 --all
// Create the Validator (this will not prompt us for a password or generate a new mnemonic because we specified it earlier in the secrets)
NAME=node1 TYPE=validator NET=mainnet make install
// Verify the status of everything (Validator address should be good, THORChain should be near 99% and other chains should have started to sync)
NAME=node1 TYPE=validator NET=mainnet TC_BACKUP=0 make status
// Recreate backup folder structure (This allows us to copy the file in the right folder)
mkdir -p ~/node1/node-launcher/backups/node1/thornode/recovery/
// (Optional) Bifrost backup
mkdir -p ~/node1/node-launcher/backups/node1/bifrost/recovery/
// Copy backup files
cp ~/myBackupNode1/thornode-16XXXXXXXX.tar.gz ./backups/node1/thornode/recovery/
// (Optional) Bifrost backup
cp ~/myBackupNode1/bifrost-16XXXXXXXX.tar.gz ./backups/node1/bifrost/recovery/
// Restore thornode
NAME=node1 TYPE=validator NET=mainnet SERVICE=thornode make restore-backup
// (Optional) Restore bifrost (Required for Maya)
NAME=node1 TYPE=validator NET=mainnet SERVICE=bifrost make restore-backup
// Verify the status of everything
NAME=node1 TYPE=validator NET=mainnet TC_BACKUP=0 make status
// Recover THORChain from 9R snapshot
NAME=node1 TYPE=validator NET=mainnet make restore-external-snapshot
// Node is now recovered and just waiting each chain to sync.
// Verify the status of everything
NAME=node1 TYPE=validator NET=mainnet TC_BACKUP=0 make status
// (Optional) Check Logs (While we wait, we can check logs for anything unusual)
NAME=node1 TYPE=validator NET=mainnet SERVICE=thornode make shell
NAME=node1 TYPE=validator NET=mainnet SERVICE=bifrost make shell
// (Optional) Set new IP
kubectl exec -it -n node1 deploy/thornode -- /kube-scripts/set-ip-address.sh "1.2.3.4"Advance Verifications
Here is a more advanced part, to verify our backup-restore contain what we need.
// (Optional) Confirm Bifrost Contains latest Signature
// Visit https://thornode.ninerealms.com/thorchain/node/thor1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
// And look for the last address in the list of signer_membership
NAME=node1 TYPE=validator NET=mainnet SERVICE=bifrost make shell
ls -l /root/.thornode/
// Look for a file containing the address, it should be present and not size 0.
// Example: localstate-thorpub1addwnpepq2ujrkurxa93vlssp2gdnxfn4egzdmjxfwe3qpnua5yyvwwj4wk8wadas6e.json
// Exit from Bifrost Shell
exit
// (Optional) if the file is missing but we happen to be able to recover it from broken node filesystem
// Note the exact name of the bifrost pod for next step
kubectl get pods -n node1
// Copy file to bifrost pod
kubectl cp ~/myBackupNode1/localstate-thorpub1addwnpepqd2928a5nf3d6jan45xf6ap6dmsj56jq9n6yhh98rpa6wt4shpcmvpm5hs3.json node1/bifrost-5d46x555bc-2jx45:root/.thornode/ -c bifrost
// (Optional) Confirm Thornode contain signature (this will return a distinct error if
// Example: Error: rpc error: code = NotFound desc = rpc error: code = NotFound desc = account thor14frkjruksjsus3jugx7hne5dftfngczw7lpvvy not found: key not found
NAME=node1 TYPE=validator NET=mainnet make set-version
// If this fail, restore thornode from backup again
NAME=node1 TYPE=validator NET=mainnet SERVICE=thornode make restore-backupConclusion
I really hope no one will need this, but hope it will help if you do!
Please let me know if you have suggestion of subjects you would like to see covered in this article series.
Updated July 23, 2024:
- Re-order
- Set IP
- chaosnet -> mainnet
Updated May 22, 2024:
- bifrost backup optional
- make restore-external-snapshot