THORChain Bare-Metal Validator — WireGuard Monitoring
Sometime a WireGuard connection would just stay hanging unresponsive and would require a restart to get back alive. When a VPN connection…
With the intend of increasing reliability of VPN Tunnel for THORChain Validators, here is a solution to monitor hanging Wireguard connections.
Sometime a WireGuard connection would just stay hanging unresponsive and would require a restart to get back alive. When a VPN connection hang, the Validator become unreachable via the public IP, which result in a display the RPC and BRF column as BAD in dashboards.
Wasn’t able to find the root cause yet, maybe Wireguard doesn’t like having multiple interface at once.
I have created a simple script to monitors network connection every 10min and restart them when required.
Create Script
sudo nano /root/wgcycle.shCopy the following, replacing list of IP and Interfaces accordingly:
#!/bin/bash
# This script monitors WireGuard network connection and
# restart service when connection is hagning. By D5Sammy.
declare -A Tunnels
declare Restarts
# Define Network Connections Here
# With the IP at the other end of the WireGuard Tunnel
# and the Name of Wireguard Interface:
Tunnels[10.10.1.1]=wg1
Tunnels[10.10.2.1]=wg2
Tunnels[10.10.3.1]=wg3
Tunnels[10.10.4.1]=wg4
echo "Checking list of ${#Tunnels[*]} tunnels."
for ip in "${!Tunnels[@]}"
do
inf=${Tunnels[$ip]}
echo -n "-> Ping $ip on $inf... "
ping -c 1 $ip > /dev/null
if [ $? -eq 0 ]; then
echo "Available!"
else
echo -n "Not available... "
if [[ " ${Restarts[@]} " =~ " ${inf} " ]]; then
echo "Interface $inf was already restared!"
else
echo -n "Restarting $inf... "
Restarts+=($inf)
systemctl restart wg-quick@$inf
echo "Done!"
fi
fi
done
echo "Run Completed, ${#Restarts[*]} Restarts Required (${Restarts[@]})."Make Script Executable
sudo chmod +x /root/wgcycle.shTry Script
sudo /root/wgcycle.shResult Output should look like this:
Checking list of 4 tunnels.
-> Ping 10.10.1.1 on wg1... Available!
-> Ping 10.10.2.1 on wg2... Available!
-> Ping 10.10.3.1 on wg3... Not available... Restarting wg3... Done!
-> Ping 10.10.4.1 on wg4... Available!
Run Completed, 1 Restarts Required (wg3).Create Service
sudo nano /etc/systemd/system/wgcycle.serviceCopy Following
[Unit]
Description=WireGuard Connection Restarting Service
After=network.target
[Service]
Type=simple
User=root
ExecStart=/root/wgcycle.sh
[Install]
WantedBy=multi-user.targetCreate Timer
sudo nano /etc/systemd/system/wgcycle.timerCopy Following:
[Unit]
Description=WireGuard Connection Restarting Timer
[Timer]
OnCalendar=*:0/10:0
[Install]
WantedBy=timers.targetOperate Service
Starting the Timer
sudo systemctl enable wgcycle.timer
sudo systemctl start wgcycle.timerMonitoring the Service status
sudo systemctl status wgcycle.timer wgcycle.serviceDisplay Service Logs
journalctl -u wgcycle.service -n 100 --no-pagerImpact of Network Interface change on MicroK8s
By default, Microk8s monitors for any change of network interface and restarts itself whenever it detect an IP change to refresh its certificate, causing all Kube Pods to turn to Unknown State. This cause our Validator to be unresponsive for a few minutes, and can lead to corruption of a chain-daemon. Wireguard Restart does trigger this because it momentarily remove and re-add an IP to the host.
Configure Microk8s to not refresh its certificate so that it doesnt restart every time a Wireguard interface is restarted.
sudo touch /var/snap/microk8s/current/var/lock/no-cert-reissue
kubectl scale deployments --replicas=0 --all -n c0
sudo microk8s stop
sudo microk8s start
kubectl scale deployments --replicas=1 --all -n c0Hope this help improving node connectivity.