What's new

WANFailover Dual WAN Failover Script

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Yes, everything is ok now.

Well, not quite OK, just as it was, OK on two routers, and on one 5 minutes after rebooting switches WAN0 and WAN1 without interruption and eventually gets into normal mode
The same problem, only immediately after the reboot and it can last 10-30 minutes. In my case, I think this is the port of the provider's switch, it is blocked for a while from pings, as I understand it, the ip has not yet been received, and DDoS is coming))) I don't know how to make the script run with a delay of at least 30 seconds.

P.S This situation is not after every reboot, but in eight cases out of ten.
 
Code:
     522: Jun 25 13:50:56 wan-failover.sh: Service Restart - Restarted leds service
     523: Jun 25 13:50:56 wan-failover.sh: Service Restart - Restarting dnsmasq service
     526: Jun 25 13:50:57 wan-failover.sh: Service Restart - Restarted dnsmasq service
     527: Jun 25 13:50:57 wan-failover.sh: Service Restart - Restarting firewall service
     530: Jun 25 13:50:58 wan-failover.sh: Service Restart - Restarted firewall service
     531: Jun 25 13:50:58 wan-failover.sh: Email Notification - Email Notifications are not configured
     532: Jun 25 13:50:58 wan-failover.sh: WAN Status - wan0 enabled
     533: Jun 25 13:50:58 wan-failover.sh: WAN Status - Route already exists: 1.1.1.1 via 10.128.131.254 dev eth0 metric 1
     534: Jun 25 13:51:04 wan-failover.sh: WAN Status - wan0 has 100% packet loss ***Verify 1.1.1.1 is a valid server for ICMP Echo Requests***
     535: Jun 25 13:51:04 wan-failover.sh: WAN Status - wan1 enabled
     536: Jun 25 13:51:04 wan-failover.sh: WAN Status - Route already exists: 1.0.0.1 via 10.1.0.1 dev eth5 metric 1
     537: Jun 25 13:51:08 wan-failover.sh: WAN Status - wan1 has 0% packet loss
     538: Jun 25 13:51:08 wan-failover.sh: WAN1 Active - Verifying WAN1
     539: Jun 25 13:51:08 wan-failover.sh: WAN0 Failback Monitor - Monitoring WAN0 via 1.1.1.1 for Failback
     553: Jun 25 13:51:24 wan-failover.sh: WAN0 Failback Monitor - Connection Detected - WAN0 Packet Loss: 0%
     554: Jun 25 13:51:24 wan-failover.sh: WAN Switch - Switching wan0 to Primary WAN
     555: Jun 25 13:51:24 wan-failover.sh: WAN Switch - WAN IP Address: 10.128.131.80
     556: Jun 25 13:51:24 wan-failover.sh: WAN Switch - WAN Gateway: 10.128.131.254
     557: Jun 25 13:51:24 wan-failover.sh: WAN Switch - WAN Interface: eth0
     558: Jun 25 13:51:24 wan-failover.sh: WAN Switch - WAN Interface: eth0
     559: Jun 25 13:51:24 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan0 DNS1 Server
     560: Jun 25 13:51:25 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan0 DNS2 Server
     561: Jun 25 13:51:25 wan-failover.sh: WAN Switch - Deleting default route via 10.1.0.1 dev eth5
     562: Jun 25 13:51:25 wan-failover.sh: WAN Switch - Adding default route via 10.128.131.254 dev eth0
     563: Jun 25 13:51:25 wan-failover.sh: WAN Switch - QoS is Disabled
     564: Jun 25 13:51:26 wan-failover.sh: WAN Switch - Switched wan0 to Primary WAN
     565: Jun 25 13:51:26 wan-failover.sh: Service Restart - Restarting qos service
     567: Jun 25 13:51:26 wan-failover.sh: Service Restart - Restarted qos service
     568: Jun 25 13:51:26 wan-failover.sh: Service Restart - Restarting leds service
     572: Jun 25 13:51:27 wan-failover.sh: Service Restart - Restarted leds service
     573: Jun 25 13:51:27 wan-failover.sh: Service Restart - Restarting dnsmasq service
     576: Jun 25 13:51:28 wan-failover.sh: Service Restart - Restarted dnsmasq service
     577: Jun 25 13:51:28 wan-failover.sh: Service Restart - Restarting firewall service
     580: Jun 25 13:51:29 wan-failover.sh: Service Restart - Restarted firewall service
     581: Jun 25 13:51:29 wan-failover.sh: Email Notification - Email Notifications are not configured
     582: Jun 25 13:51:29 wan-failover.sh: WAN Status - wan0 enabled
     583: Jun 25 13:51:29 wan-failover.sh: WAN Status - Route already exists: 1.1.1.1 via 10.128.131.254 dev eth0 metric 1
     584: Jun 25 13:51:33 wan-failover.sh: WAN Status - wan0 has 0% packet loss
     585: Jun 25 13:51:33 wan-failover.sh: WAN Status - wan1 enabled
     586: Jun 25 13:51:33 wan-failover.sh: WAN Status - Route already exists: 1.0.0.1 via 10.1.0.1 dev eth5 metric 1
     587: Jun 25 13:51:37 wan-failover.sh: WAN Status - wan1 has 0% packet loss
     588: Jun 25 13:51:37 wan-failover.sh: WAN0 Active - Verifying WAN0
     589: Jun 25 13:51:37 wan-failover.sh: WAN0 Failover Monitor - Monitoring WAN0 via 1.1.1.1 for Failure

For whatever reason your router is having issues pinging 1.1.1.1 as a target. Your packet loss is going from 100%, 20%, and 0% loss which means the script is working properly however the target is not able to be pinged by the router for whatever reason (firewall, ISP, etc).
 
The same problem, only immediately after the reboot and it can last 10-30 minutes. In my case, I think this is the port of the provider's switch, it is blocked for a while from pings, as I understand it, the ip has not yet been received, and DDoS is coming))) I don't know how to make the script run with a delay of at least 30 seconds.

P.S This situation is not after every reboot, but in eight cases out of ten.

Are you able to modify firewall rules on their device to allow ICMP? Try testing with a different target IP.

EDIT: Check WAN0 Connection State whenever you are having issues.
Code:
nvram get wan0_state_t
 
Last edited:
Are you able to modify firewall rules on their device to allow ICMP? Try testing with a different target IP.
No, I don't have access to the provider's equipment. ICMP is allowed at the provider, since if authorization takes place immediately after a reboot, then any IP is pinged. If I disable the script or disable Dual WAN, then there are no problems, getting an IP (authorization) occurs immediately (blocking from the router of the provider's port does not occur). I thought that some Entware scripts were interfering, I turned them off, left only WAN Failover the situation repeats. That's why I'm thinking how to launch WAN Failover with a delay.

P.S. And for the rest, I have no complaints about WAN Failover, everything works as it should on version 1.5.1
 
No, I don't have access to the provider's equipment. ICMP is allowed at the provider, since if authorization takes place immediately after a reboot, then any IP is pinged. If I disable the script or disable Dual WAN, then there are no problems, getting an IP (authorization) occurs immediately (blocking from the router of the provider's port does not occur). I thought that some Entware scripts were interfering, I turned them off, left only WAN Failover the situation repeats. That's why I'm thinking how to launch WAN Failover with a delay.

P.S. And for the rest, I have no complaints about WAN Failover, everything works as it should on version 1.5.1

How long after boot up do you feel a delay is necessary? I have an idea.
 
Add me to the list of happy 'customers':

Highlights:
RT-AX86U AsusWrt-Merlin 386.7
Script Version V1.4.6
Main Provider: Quantum Fiber (Lumen/Centurylink) symmetrical 1Gbps
Standby Provider: Comcast 100/15 Mbps


My main struggle to get everything up and running was not related to this script but due to Quantum requiring a VLAN ID (201) and Comcast only working without a VLAN ID set.
There is only one place to set VLAN 201 (LAN->IPTV) and the Router applies it to both (1G & 2.5G) WAN Ports breaking the ability to get an IP from Comcast.
On a hunch I tried using a LAN Port (Lan4) for Comcast and it looks like the VLAN ID is not applied to the LAN Port - even when the LAN Port is used as a WAN port - so Comcast working as well now.

With Asus native implementation the system would fail over but never fallback.
I installed the script and set basic parameters (not using QoS) and voila - everything is working.

I tested failover and fallback a couple of times and reboot/poweroff and all is well.

Thanks for your great work!
 
I have been testing v1.5.1 and it is working great for me, but then, I have a simple setup on the WAN connections. I am even double NATed behind the two ISP router/modem(s). I am testing by disconnecting the ISP equipment on their WAN side thereby maintaining the ethernet connection between those two units and my RT-AX86U. So, unlike the Asus method that requires the loss of WAN ethernet connectivity to even attempt to work, this script works great with no ethernet loss.

Simply awesome work. Asus needs to license this from @Ranger802004 :)
 
v1.5.2-beta Release:
Manually upgrade to this beta by running the following command" ***Allow for cronjob to relaunch the script***
Code:
/usr/sbin/curl -s "https://raw.githubusercontent.com/Ranger802004/asusmerlin/main/wan-failover_v1.5.2-beta.sh" -o "/jffs/scripts/wan-failover.sh" && chmod 755 /jffs/scripts/wan-failover.sh && sh /jffs/scripts/wan-failover.sh kill

To revert back to Production Release:
Code:
/jffs/scripts/wan-failover.sh update

***To configure new boot delay timer, run config option to specify the Boot Delay Timer in Seconds, if a Boot Delay Timer is not configured, the script will ignore this function and continue***

Release Notes:
v1.5.2-beta
- Added delay in WAN Status for when NVRAM is inaccessible.
- Added support for Load Balance Mode
- Changed from using NVRAM Variables: wan0_ifname & wan1_ifname to using NVRAM Variables: wan0_gw_ifname & wan1_gw_ifname.
- Improved DNS Settings detection during Switch WAN function.
- Improved Switch WAN Logic to verify NVRAM Variables: wan_gateway, wan_gw_ifname, and wan_ipaddr are properly updated.
- Added warning message when attempting to execute Run or Manual Mode if the script is already running.
- IP Routes are now given a value of metric 1 when created during WAN Status checks.
- Improved detection of existing routes for WAN Target IP Addresses to identify potential misconfiguration.
Example: WAN0 Target IP Address is a DNS Server for WAN1 and the route already exists for WAN1 Interface.
- Support for ASUS Merlin Firmware 386.7
- Added Boot Delay Timer
 
BTW, this is my config:


WAN0TARGET=209.244.0.4
WAN1TARGET=209.244.0.3
PINGCOUNT=3
PINGTIMEOUT=1
WANDISABLEDSLEEPTIMER=10
WAN0_QOS_IBW=102400
WAN1_QOS_IBW=102400
WAN0_QOS_OBW=102400
WAN1_QOS_OBW=10240
WAN0_QOS_OVERHEAD=0
WAN1_QOS_OVERHEAD=0
WAN0_QOS_ATM=0
WAN1_QOS_ATM=0
PACKETLOSSLOGGING=1

Also, I am NOT using the most popular DNS servers on my WAN connections such as google, quad 9, cloudflare, etc. This ensures that these popular servers are always accessable from either WAN interface by devices on my LAN and not blocked by a static route to these servers created by this script or any other script that may be running.
 
v1.5.2-beta Release:
Manually upgrade to this beta by running the following command" ***Allow for cronjob to relaunch the script***
Code:
/usr/sbin/curl -s "https://raw.githubusercontent.com/Ranger802004/asusmerlin/main/wan-failover_v1.5.2-beta.sh" -o "/jffs/scripts/wan-failover.sh" && chmod 755 /jffs/scripts/wan-failover.sh && sh /jffs/scripts/wan-failover.sh kill

To revert back to Production Release:
Code:
/jffs/scripts/wan-failover.sh update

***To configure new boot delay timer, run config option to specify the Boot Delay Timer in Seconds, if a Boot Delay Timer is not configured, the script will ignore this function and continue***

Release Notes:
v1.5.2-beta
- Added delay in WAN Status for when NVRAM is inaccessible.
- Added support for Load Balance Mode
- Changed from using NVRAM Variables: wan0_ifname & wan1_ifname to using NVRAM Variables: wan0_gw_ifname & wan1_gw_ifname.
- Improved DNS Settings detection during Switch WAN function.
- Improved Switch WAN Logic to verify NVRAM Variables: wan_gateway, wan_gw_ifname, and wan_ipaddr are properly updated.
- Added warning message when attempting to execute Run or Manual Mode if the script is already running.
- IP Routes are now given a value of metric 1 when created during WAN Status checks.
- Improved detection of existing routes for WAN Target IP Addresses to identify potential misconfiguration.
Example: WAN0 Target IP Address is a DNS Server for WAN1 and the route already exists for WAN1 Interface.
- Support for ASUS Merlin Firmware 386.7
- Added Boot Delay Timer
Good news after 2 reboots I had no Wan Fail-over notification and Skynet is ready too. (Skynet had some issues while on v1.5.1)
My setup:
Code:
WAN0TARGET=9.9.9.9
WAN1TARGET=4.2.2.2
PINGCOUNT=5
PINGTIMEOUT=1
WANDISABLEDSLEEPTIMER=15
BOOTDELAYTIMER=150

So, the BOOTDELAYTIMER working fine on my setup. I tested also failover and failback with success.
Many thanks! You rock! :)
 
v1.5.2-beta Release:
Manually upgrade to this beta by running the following command" ***Allow for cronjob to relaunch the script***
Code:
/usr/sbin/curl -s "https://raw.githubusercontent.com/Ranger802004/asusmerlin/main/wan-failover_v1.5.2-beta.sh" -o "/jffs/scripts/wan-failover.sh" && chmod 755 /jffs/scripts/wan-failover.sh && sh /jffs/scripts/wan-failover.sh kill

To revert back to Production Release:
Code:
/jffs/scripts/wan-failover.sh update

***To configure new boot delay timer, run config option to specify the Boot Delay Timer in Seconds, if a Boot Delay Timer is not configured, the script will ignore this function and continue***

Release Notes:
v1.5.2-beta
- Added delay in WAN Status for when NVRAM is inaccessible.
- Added support for Load Balance Mode
- Changed from using NVRAM Variables: wan0_ifname & wan1_ifname to using NVRAM Variables: wan0_gw_ifname & wan1_gw_ifname.
- Improved DNS Settings detection during Switch WAN function.
- Improved Switch WAN Logic to verify NVRAM Variables: wan_gateway, wan_gw_ifname, and wan_ipaddr are properly updated.
- Added warning message when attempting to execute Run or Manual Mode if the script is already running.
- IP Routes are now given a value of metric 1 when created during WAN Status checks.
- Improved detection of existing routes for WAN Target IP Addresses to identify potential misconfiguration.
Example: WAN0 Target IP Address is a DNS Server for WAN1 and the route already exists for WAN1 Interface.
- Support for ASUS Merlin Firmware 386.7
- Added Boot Delay Timer
Great job! Thank you for implementing what I needed. Thank you for being there!
 
Great job! Thank you for implementing what I needed. Thank you for being there!
Glad that resolved your issues and I made it customizable for anyone to set based on their needs for their setup.
 
BTW, this is my config:


WAN0TARGET=209.244.0.4
WAN1TARGET=209.244.0.3
PINGCOUNT=3
PINGTIMEOUT=1
WANDISABLEDSLEEPTIMER=10
WAN0_QOS_IBW=102400
WAN1_QOS_IBW=102400
WAN0_QOS_OBW=102400
WAN1_QOS_OBW=10240
WAN0_QOS_OVERHEAD=0
WAN1_QOS_OVERHEAD=0
WAN0_QOS_ATM=0
WAN1_QOS_ATM=0
PACKETLOSSLOGGING=1

Also, I am NOT using the most popular DNS servers on my WAN connections such as google, quad 9, cloudflare, etc. This ensures that these popular servers are always accessable from either WAN interface by devices on my LAN and not blocked by a static route to these servers created by this script or any other script that may be running.
I am working on this for down the road, not having much luck yet...lol but stay tuned.

EDIT: I made some progress on this, I can limit it to ICMP protocol from the router itself for the Target IP so only that traffic would be effected. I will incorporate this in the next beta.

EDIT2: Ok I took the challenge, try v1.5.3-beta, careful, it is a good bit of an overhaul from v1.5.2-beta so don't be mad if it doesn't work right, I'm trying to optimize this tool as best as possible.
 
Last edited:
v1.5.3-beta Release:
Manually upgrade to this beta by running the following command" ***Allow for cronjob to relaunch the script***
Code:
/usr/sbin/curl -s "https://raw.githubusercontent.com/Ranger802004/asusmerlin/main/wan-failover_v1.5.3-beta.sh" -o "/jffs/scripts/wan-failover.sh" && chmod 755 /jffs/scripts/wan-failover.sh && sh /jffs/scripts/wan-failover.sh kill

To revert back to Production Release:
Code:
/jffs/scripts/wan-failover.sh update

***To configure new boot delay timer, run config option to specify the Boot Delay Timer in Seconds, if a Boot Delay Timer is not configured, the script will ignore this function and continue***

*** This version changes how IP Rules / Routes are created, you can delete the old static routes created by the script under Routing Table Main or reboot the router to remove them***


Release Notes:
v1.5.3-beta
- Added delay in WAN Status for when NVRAM is inaccessible.
- Added support for Load Balance Mode
- Changed from using NVRAM Variables: wan0_ifname & wan1_ifname to using NVRAM Variables: wan0_gw_ifname & wan1_gw_ifname.
- Improved DNS Settings detection during Switch WAN function.
- Improved Switch WAN Logic to verify NVRAM Variables: wan_gateway, wan_gw_ifname, and wan_ipaddr are properly updated.
- Added warning message when attempting to execute Run or Manual Mode if the script is already running.
- Support for ASUS Merlin Firmware 386.7
- Added Boot Delay Timer
- Target IP Routes are now created using IP Rules from Local Router to Routing Table 100 (WAN0) and Routing Table 200 (WAN1) so client devices on the network do not use the created routes.
- Moved Email Variables from Global Variables so Email Configuration is checked every time a switch occurs instead of when script restarts.
 
Good morning. Today I decided to try how the script works. Pulled out the kayubel.
here is the log
https://drive.google.com/file/d/1UH4QShIdKfl51-rZqS0WsgK_JNlIHm_2/view?usp=sharing
It seems to me that the script did some unnecessary manipulations, connecting was switched from different wan 3 times, although it should have been 2 times, please look
@VIper_Rus all I can see from your logs is that your Target IP is dropping packets randomly or to the point you have 100% packet loss which triggers a WAN Failover. I'm not sure what in your set up is causing those issues and perhaps it is your geographical location and ISP blocking the traffic? I highly suggest trying a new Target IP instead of 1.1.1.1 for WAN0. I believe you are in Russia, correct? Is there a target IP you can use from inside Russia? I would recommend that for your configuration.
 
I have already changed the ip. The last time we tested, there was a different ip. In fact, my Internet is very stable, problems arise precisely at the moment of switching. In the process, of course, sometimes the script reports from 20% of losses, but this only means a single ping loss and this rarely happens. For example, the router that is a little buggy when rebooting in general, you probably noticed, shows 100% losses, switches wan1 to the main wan and already instantly has wan0 - 0% losses. I still have a feeling that I need to add a delay before checking wan0 when switching to wan0.

In that log at 10:49:52 I pulled out the cable
at 10:52:48 I inserted the cable.

at 10:53:02, the script reported 40% losses. Isn't it a little early? Very little time has passed. I have a delay of 60 seconds in my settings, I would give the script time for a full wan switch and then I would have already started monitoring the main channel.

My Settings

WAN0TARGET=1.1.1.1
WAN1TARGET=1.0.0.1
PINGCOUNT=5
PINGTIMEOUT=2
WANDISABLEDSLEEPTIMER=60
WAN0_QOS_IBW=0
WAN1_QOS_IBW=0
WAN0_QOS_OBW=0
WAN1_QOS_OBW=0
WAN0_QOS_OVERHEAD=0
WAN1_QOS_OVERHEAD=0
WAN0_QOS_ATM=0
WAN1_QOS_ATM=0
PACKETLOSSLOGGING=1
BOOTDELAYTIMER=60


WAN DISABLED SLEEP TIMER=60

Isn't this the setting that affects the start of channel monitoring after switching it?
 
Last edited:
I have already changed the ip. The last time we tested, there was a different ip. In fact, my Internet is very stable, problems arise precisely at the moment of switching. In the process, of course, sometimes the script reports from 20% of losses, but this only means a single ping loss and this rarely happens. For example, the router that is a little buggy when rebooting in general, you probably noticed, shows 100% losses, switches wan1 to the main wan and already instantly has wan0 - 0% losses. I still have a feeling that I need to add a delay before checking wan0 when switching to wan0.

In that log at 10:49:52 I pulled out the cable
at 10:52:48 I inserted the cable.

Have you updated to v.1.5.3-beta and ran the config option again to utilize the boot delay timer? As far as a delay between switching, let's try this first....whenever you switch and it is in this weird "Packet Loss" state, run this command for me please. If the interface is not in a ready state I can build logic around that to delay monitoring instead of a static sleep timer.
Code:
nvram get wan0_state_t
 
I just get the impression that with my 3 vpn clients on the router, the channel stupidly does not have time to switch, and the script starts monitoring the main channel without waiting for its full commissioning ;) right now I'll check with nvram get wan0_state_t
 

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top