THORChain Validator Migration — The Exhaustive Guide
A few parts of documentation exists about migrating node, saying it’s risky and should only be attempted in emergency situation, but also…
A few guide exist about migrating THORChain Nodes, saying it’s risky, should not be attempted on Active Validators, and should only be attempted in emergency situations, but also, uncompleted and can lead to bad surprises.
It also assumes that we are building a brand new validator and have two weeks to wait for it to fully sync, but what if we already have a test node fully synced and would like to migrate to it?
This guide is meant to be a complete and exhaustive step-by-step guide about migrating an existing THORChain Validator. I was able to write this after putting multiple source of information together, as well as multiple trial and errors. I have used this procedure multiple times on Active Validators, with success. Following this guide should allow to migrate an active node with limited slash, and no bond-slash. Hopefully, it will be able to save the day of a Node Operator in an uncomfortable situation, or simply facilitate a migration to a bare-metal server.
In this guide I will use the term “source node” to indicate the node to be migrated, and “target node” to indicate the node where the source node will be migrated to. I will also use the namespace “node1” for the source node, and “node2” for the target node, instead of the “thornode” default namespace.
Creating the Target Node
Note: If creating a new node installation, or a non-active node, it can be way easier to create the node from the mnemonic to start with instead, follow the THORChain Validators — Re-creating from Backups Guide to create the new node. Whatever you do, never have 2 active node running at the same time. Having both node offline for 1 hour or 2 wont cause that much slash anyway.
To start, we need to have a target node. The target node can be prepared like we would normally do, and wait until all chains are fully sync, but we will not send any initial bond to it. If we are building this validator specifically for this purpose, using the exact same password as used for the source node will make our life easier, this is an important detail that is not mentioned anywhere.
If we already have a node ready and don’t want to (or can’t) create a new one and wait for it to sync. No worries, we can fix it with the following!
Change password of a target node (optional)
If we have an existing node to be used as a target node, but that was not configured using the same password, we can change it with the following steps:
Verify if password at same or different:
// (On the Source Node)
NAME=node1 TYPE=validator NET=mainnet make password
// (On the Target Node)
NAME=node2 TYPE=validator NET=mainnet make passwordChange the password on Target Node:
// (On the Target Node)
// Delete Password Secret
kubectl -n node2 delete secret thornode-password
// Recreate Password Secret
kubectl -n node2 create secret generic thornode-password --from-literal=password="<password of the source node>"Note: even if we think that we could set the mnemonic now, don’t, setting the mnemonic at this point will cause the target node to have the same address as the source node while it sync, and we don’t want this until we scale down the pod on the source node, especially if the node is active. This would cause double signing and severe bond slash.
Destroy Bifrost and Thornode Pods and re-create them:
// (On the Target Node)
// Stop thornode and bifrost
kubectl scale -n node2 --replicas=0 deploy/bifrost deploy/thornode --timeout=5m
// Delete pods and storages
kubectl -n node2 delete deploy/bifrost
kubectl -n node2 delete pvc bifrost
kubectl -n node2 delete deploy/thornode
kubectl -n node2 delete pvc thornode
// Recreate thornode and bifrost pod.
NAME=node2 TYPE=validator NET=mainnet make install
// Speed up the sync of the THOR chain by recovering from latest prune
NAME=node2 TYPE=validator NET=mainnet make recover-ninerealms
// Wait until THOR chain is fully synced, it should take a few hoursThe Target Node now has the same password as the Source Node, and will be able to read the backup files created by the Source Node.
Backup of Source Node
Run the following to take a backup of everything important on the Source Node:
// (On the Source Node)
// Create a Folder to store Migration Files
mkdir ~/migrationNodeABCD
// Copy the Password and Mnemonic in clear text
NAME=node1 TYPE=validator NET=mainnet make password > ~/migrationNodeABCD/password.txt
NAME=node1 TYPE=validator NET=mainnet make mnemonic > ~/migrationNodeABCD/mnemonic.txt
// Make backup of Thornode
NAME=node1 TYPE=validator NET=mainnet SERVICE=thornode make backup
// The backup function will display the path of the backup folder
// List the content of the folder to get the name of the latest file
// Note: need a fresh backup because backup contain config/genesis.json and will override it with an old one if the backup is old.
ls -l ./backups/node1/thornode/2023-03-03
// Copy backup file (ensure to take the latest one)
cp ./backups/node1/thornode/2023-03-03/thornode-1678206061.tar.gz ~/migrationNodeABCD/
// Make backup of Bifrost
NAME=node1 TYPE=validator NET=mainnet SERVICE=bifrost make backup
// The backup function will display the path of the backup folder
// List the content of the folder to get the name of the latest file
ls -l ./backups/node1/bifrost/2023-03-03
// Copy backup file (ensure to take the latest one)
cp ./backups/node1/bifrost/2023-03-03/bifrost-1678206153.tar.gz ~/migrationNodeABCD/Take node of the node1 subfolder in the backups folder, that folder will be named the same as the namespace of the node, if the default is used then it will be ./backups/thornode/
Filename get appended the Unix Epoch Time.
Copy Files Over
I work off a dedicated laptop, which connects by SSH to each administrative VM. This is an example copying the backup files from the Source Node locally to the laptop, and then to the Target Node, using SCP. You can move the files over the way you prefer, as long as the two backup files are in the backup folder of the Target Node.
Ensure the backup are recent and there was no churn since the backup was taken if the node is active.
// (On the Target Node)
// Create folder to copy the backup files
mkdir -p ~/node-launcher/backups/node2/bifrost/migration/
mkdir -p ~/node-launcher/backups/node2/thornode/migration/// (On the Laptop)
// Create a folder to store important files locally
mkdir ~/migration
// Copy files from the Source Node
scp -r <user>@<host of source node>:~/migrationNodeABCD ~/migration/
// Copy files to the Target Node
scp -r ~/migration/migrationNodeABCD/thornode-1678206061.tar.gz <user>@<host of target node>:~/node-launcher/backups/node2/thornode/migration/
scp -r ~/migration/migrationNodeABCD/bifrost-1678206153.tar.gz <user>@<host of target node>:~/node-launcher/backups/node2/bifrost/migration/Take node of the node2 subfolder in the backups folder, that folder will be named the same as the namespace of the node, if the default is used then it will be ./backups/thornode/ but it needs to be the same namespace as the target node or it won’t display in the choice of sources to restore.
Switch over
We want to do the following in a short period of time.
Set the mnemonic on the Target Node
// (On the Target Node)
// Delete Mnemonic Secret
kubectl -n node2 delete secret thornode-mnemonic
// Recreate Mnemonic Secret
kubectl -n node2 create secret generic thornode-mnemonic --from-literal=mnemonic="<mnemonic of the source node>"At this point I suggest checking the slash point and the bond of the Source Node, so we have a reference to see if something go wrong.
Scale down the Source Node:
// (On the Source Node)
// Scale down Thornode and Bifrost Pods
kubectl scale -n node1 --replicas=0 deploy/thornode deploy/bifrost
// Verify until both pods no longer appear in list.
kubectl get pods -n node1Scale down Target Node:
// (On the Target Node)
// Scale down Thornode and Bifrost Pods
kubectl scale -n node2 --replicas=0 deploy/thornode deploy/bifrost
// Verify until both pods no longer appear in list.
kubectl get pods -n node2Delete Bifrost PVC to recreate it using the new password and mnemonic:
// (On the Target Node)
// Delete Bifrost Deployment
kubectl -n node2 delete deploy/bifrost
// Delete Bifrost PVC
kubectl -n node2 delete pvc bifrost
// ReCreate Bifrost Pod
NAME=node2 TYPE=validator NET=mainnet make installScale down the Bifrost Pod that was just created:
// (On the Target Node)
// Check for Bifrost Pod until it's initiated
kubectl get pods -n node2
// Scale down Thornode and Bifrost Pods
kubectl scale -n node2 --replicas=0 deploy/thornode deploy/bifrost
// Verify until both pods no longer appear in list.
kubectl get pods -n node2Restore the Backup and select the file that was copied previously, it will appear in a “migration” folder:
// (On the Target Node)
// Make restore-backup of Thornode
NAME=node2 TYPE=validator NET=mainnet SERVICE=thornode make restore-backup
// Make restore-backup of Bifrost
NAME=node2 TYPE=validator NET=mainnet SERVICE=bifrost make restore-backup
// Verify until both pods are ready
kubectl get pods -n node2Clean thornode
Note: Yes, even in Dec 2024, this is still required is the target node was initialised with an initial bond.
// (On the Target Node)
// Open a Debug Console on the Thornode Pod
NAME=node2 TYPE=validator NET=mainnet make debug
// Delete Key files
rm -rf /root/.thornode/THORChain-ED25519 /root/.thornode/keyring-file /root/.thornode/config/genesis.json
// Exit Debug Console
exitScale up Target Node
// (On the Target Node)
kubectl scale -n node2 --replicas=1 deploy/thornode deploy/bifrostMonitoring, I like to open each of the following in a different console, to see them all simultaneously
// (On the Target Node)
// Verify until both pods are ready
kubectl get pods -n node2
// Bifrost Logs
NAME=node2 TYPE=validator NET=mainnet SERVICE=bifrost make logs
// Thornode Logs
NAME=node2 TYPE=validator NET=mainnet SERVICE=thornode make logs
// Make Status
NAME=node2 TYPE=validator NET=mainnet make statusUpdate IP Address:
// (On the Target Node)
// Set IP Address (Automated)
NAME=node2 TYPE=validator NET=mainnet SERVICE=thornode make set-ip-address
// Set IP Address (Manual) If behind Proxy (Optional)
kubectl exec -it -n node2 deploy/thornode -- /kube-scripts/set-ip-address.sh "1.2.3.4"Not only is this useful to set the new IP, but it also confirms that thornode is still able to sign and send tx.
We want to monitor logs when running this command, in case we get an error that the keyring is not available.
It is frequent to get the following error: Error: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 2502053, got 2502041: incorrect account sequence: invalid request
Simply wait a few minutes and retry later.
If something didn’t connect properly, sometime restarting Bifrost and Thornode can help:
// (On the Target Node)
// Restart Bifrost and Thornode Pods
NAME=node2 TYPE=validator NET=mainnet SERVICE=bifrost make restart
NAME=node2 TYPE=validator NET=mainnet SERVICE=thornode make restartTroubleshooting
As of March 2023, it is frequent that an active node get slash for resigning network fee, the error “signer already signed MsgNetworkFee” would display in thornode logs. If we look closely, we will see that the block heights are in the past, and this will end as soon as it reaches the tip. This issue may be fixed in an update soon.
Rollback (if everything goes wrong)
If everything goes wrong, we can simply switch back to the Source Node with the following steps. Whatever we do, if we are working on an Active node, it is critical that the thornode and bifrost pod are not running on both nodes at the same time, or we will get bond-slashed pretty bad.
// (On the Target Node)
// Scale down Thornode and Bifrost Pods
kubectl scale -n node2 --replicas=0 deploy/thornode deploy/bifrost
// Verify until both pods no longer appear in list.
kubectl get pods -n node2It is very important to ensure that the pods on the Target Node are completely Stopped before enabling them on the Source Node. If they both run at the same time, the node may get bond slash.
// (On the Source Node)
// Scale up Thornode and Bifrost Pods
kubectl scale -n node1 --replicas=1 deploy/thornode deploy/bifrostConclusion
I hope this guide will help Node Operator migrating to Bare-Metal. Should you have any questions please don’t hesitate to reach out to me on Discord.