THORChain Bare-Metal Validator  —  WireGuard Monitoring

Sometime a WireGuard connection would just stay hanging unresponsive and would require a restart to get back alive. When a VPN connection…

Share
THORChain Bare-Metal Validator  —  WireGuard Monitoring

With the intend of increasing reliability of VPN Tunnel for THORChain Validators, here is a solution to monitor hanging Wireguard connections.

Sometime a WireGuard connection would just stay hanging unresponsive and would require a restart to get back alive. When a VPN connection hang, the Validator become unreachable via the public IP, which result in a display the RPC and BRF column as BAD in dashboards.

Wasn’t able to find the root cause yet, maybe Wireguard doesn’t like having multiple interface at once.

I have created a simple script to monitors network connection every 10min and restart them when required.

Create Script

sudo nano /root/wgcycle.sh

Copy the following, replacing list of IP and Interfaces accordingly:

#!/bin/bash
# This script monitors WireGuard network connection and
# restart service when connection is hagning. By D5Sammy.

declare -A Tunnels
declare Restarts

# Define Network Connections Here
# With the IP at the other end of the WireGuard Tunnel
# and the Name of Wireguard Interface:

Tunnels[10.10.1.1]=wg1
Tunnels[10.10.2.1]=wg2
Tunnels[10.10.3.1]=wg3
Tunnels[10.10.4.1]=wg4

echo "Checking list of ${#Tunnels[*]} tunnels."

for ip in "${!Tunnels[@]}"
do
  inf=${Tunnels[$ip]}
  echo -n "-> Ping $ip on $inf... "

  ping -c 1 $ip > /dev/null

  if [ $? -eq 0 ]; then
    echo "Available!"
  else
    echo -n "Not available... "

    if [[ " ${Restarts[@]} " =~ " ${inf} " ]]; then
      echo "Interface $inf was already restared!"
    else
      echo -n "Restarting $inf... "
      Restarts+=($inf)
      systemctl restart wg-quick@$inf
      echo "Done!"
    fi
  fi
done

echo "Run Completed, ${#Restarts[*]} Restarts Required (${Restarts[@]})."

Make Script Executable

sudo chmod +x /root/wgcycle.sh

Try Script

sudo /root/wgcycle.sh

Result Output should look like this:

Checking list of 4 tunnels.
-> Ping 10.10.1.1 on wg1... Available!
-> Ping 10.10.2.1 on wg2... Available!
-> Ping 10.10.3.1 on wg3... Not available... Restarting wg3... Done!
-> Ping 10.10.4.1 on wg4... Available!
Run Completed, 1 Restarts Required (wg3).

Create Service

sudo nano /etc/systemd/system/wgcycle.service

Copy Following

[Unit]
Description=WireGuard Connection Restarting Service
After=network.target

[Service]
Type=simple
User=root
ExecStart=/root/wgcycle.sh

[Install]
WantedBy=multi-user.target

Create Timer

sudo nano /etc/systemd/system/wgcycle.timer

Copy Following:

[Unit]
Description=WireGuard Connection Restarting Timer

[Timer]
OnCalendar=*:0/10:0

[Install]
WantedBy=timers.target

Operate Service

Starting the Timer

sudo systemctl enable wgcycle.timer
sudo systemctl start wgcycle.timer

Monitoring the Service status

sudo systemctl status wgcycle.timer wgcycle.service

Display Service Logs

journalctl -u wgcycle.service -n 100 --no-pager

Impact of Network Interface change on MicroK8s

By default, Microk8s monitors for any change of network interface and restarts itself whenever it detect an IP change to refresh its certificate, causing all Kube Pods to turn to Unknown State. This cause our Validator to be unresponsive for a few minutes, and can lead to corruption of a chain-daemon. Wireguard Restart does trigger this because it momentarily remove and re-add an IP to the host.

Configure Microk8s to not refresh its certificate so that it doesnt restart every time a Wireguard interface is restarted.

sudo touch /var/snap/microk8s/current/var/lock/no-cert-reissue
kubectl scale deployments --replicas=0 --all -n c0
sudo microk8s stop
sudo microk8s start
kubectl scale deployments --replicas=1 --all -n c0

Hope this help improving node connectivity.