WAN restarts without routes - how to retry automatically?

Alexander K

New Around Here
Hello everyone,

I have an esoteric problem with my WAN. Due to circumstances outside my control, the network equipment on the other end of my WAN connection is power-cycled 3 times a day (as a clumsy "maybe it will go away" solution to another problem). Unfortunately, when it happens, and the rack is powered up afterwards, my RT-AC68U running asus-merlin 3.0.0.4.378.51_0 fails to get its WAN interface up properly. Same thing happened with stock ASUS firmware. I suspect some race condition with how the switches get back up..

The symptom is that WAN is reported as being back up but the routes table is empty save for LAN entry.

If I wait for about a minute and go to WAN settings in the UI and click [Apply], it goes back to normal. If I don't, the WAN stays in this abnormal state for over an hour before correcting itself.

Log output, where N.N.N.N is the censored-out IP of my OpenVPN server (RT-AC68U is set to connect to it at all times), and repeated dnsmasq entries shortened:

Apr 16 04:59:59 WAN Connection: Ethernet link down.
Apr 16 05:00:59 WAN Connection: Ethernet link up.
Apr 16 05:00:59 rc_service: wanduck 30514:notify_rc restart_wan_if 0
Apr 16 05:00:59 dnsmasq[30517]: read /etc/hosts - 5 addresses
Apr 16 05:00:59 dnsmasq[30517]: using nameserver 8.8.4.4#53 for domain local
Apr 16 05:00:59 dnsmasq[30517]: using nameserver 8.8.8.8#53 for domain local
Apr 16 05:00:59 dnsmasq[30517]: using nameserver 8.8.8.8#53
Apr 16 05:00:59 dnsmasq[30517]: using nameserver 8.8.4.4#53
Apr 16 05:01:00 kernel: Attempt to kill tasklet from interrupt
Apr 16 05:01:00 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:00 kernel: device eth0 left promiscuous mode
Apr 16 05:01:00 kernel: device eth0 entered promiscuous mode
Apr 16 05:01:00 kernel: br0: topology change detected, propagating
Apr 16 05:01:00 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:00 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:00 [same dnsmasq output]
Apr 16 05:01:00 [same dnsmasq output]
Apr 16 05:01:02 rc_service: udhcpc 14948:notify_rc start_firewall
Apr 16 05:01:02 [same dnsmasq output]
Apr 16 05:01:03 start_nat_rules: apply the nat_rules(/tmp/nat_rules_eth0_eth0)!
Apr 16 05:01:03 wan: finish adding multi routes
Apr 16 05:01:03 rc_service: udhcpc 14948:notify_rc stop_upnp
Apr 16 05:01:03 rc_service: waitting "start_firewall" via udhcpc ...
Apr 16 05:01:05 admin: sh /opt/S50asuslighttpd.1 firewall-start
Apr 16 05:01:05 admin: sh /opt/S50downloadmaster.1 firewall-start
Apr 16 05:01:06 rc_service: udhcpc 14948:notify_rc start_upnp
Apr 16 05:01:06 rc_service: waitting "stop_upnp" via udhcpc ...
Apr 16 05:01:06 WAN Connection: Ethernet link up.
Apr 16 05:01:06 rc_service: wanduck 30514:notify_rc restart_wan_if 0
Apr 16 05:01:07 [same dnsmasq output]
Apr 16 05:01:07 kernel: Attempt to kill tasklet from interrupt
Apr 16 05:01:07 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:07 kernel: device eth0 left promiscuous mode
Apr 16 05:01:07 kernel: device eth0 entered promiscuous mode
Apr 16 05:01:07 kernel: br0: topology change detected, propagating
Apr 16 05:01:07 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:07 kernel: br0: port 1(vlan1) entering forwarding state
Apr 16 05:01:09 rc_service: udhcpc 14948:notify_rc stop_vpnclient1
Apr 16 05:01:09 rc_service: udhcpc 14948:notify_rc start_vpnclient1
Apr 16 05:01:09 rc_service: waitting "stop_vpnclient1" via udhcpc ...
Apr 16 05:01:11 [same dnsmasq output]
Apr 16 05:01:12 rc_service: udhcpc 15209:notify_rc start_firewall
Apr 16 05:01:12 rc_service: waitting "stop_vpnclient1" via udhcpc ...
Apr 16 05:01:12 openvpn[6172]: /usr/sbin/ip route del 10.8.0.1/32
Apr 16 05:01:12 openvpn[6172]: /usr/sbin/ip route del N.N.N.N/32
Apr 16 05:01:12 openvpn[6172]: ERROR: Linux route delete command failed: external program exited with error status: 2
Apr 16 05:01:12 openvpn[6172]: /usr/sbin/ip route del 0.0.0.0/1
Apr 16 05:01:12 openvpn[6172]: /usr/sbin/ip route del 128.0.0.0/1
Apr 16 05:01:12 openvpn[6172]: Closing TUN/TAP interface
Apr 16 05:01:12 openvpn[6172]: /usr/sbin/ip addr del dev tun11 local 10.8.0.6 peer 10.8.0.5
Apr 16 05:01:12 openvpn[6172]: SIGTERM[hard,] received, process exiting
Apr 16 05:01:13 dhcp client: bound 192.168.2.103 via 192.168.2.1 during 6000 seconds.
Apr 16 05:01:14 WAN Connection: ISP's DHCP did not function properly.
Apr 16 05:01:14 kernel: tun: Universal TUN/TAP device driver, 1.6
Apr 16 05:01:14 kernel: tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
Apr 16 05:01:15 [same dnsmasq output]
Apr 16 05:01:15 openvpn[15265]: OpenVPN 2.3.6 arm-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [MH] [IPv6] built on Mar 6 2015
Apr 16 05:01:15 openvpn[15265]: library versions: OpenSSL 1.0.0q 15 Jan 2015, LZO 2.08
Apr 16 05:01:15 openvpn[15265]: Socket Buffers: R=[122880->131072] S=[122880->131072]
Apr 16 05:01:15 openvpn[15266]: UDPv4 link local: [undef]
Apr 16 05:01:15 openvpn[15266]: UDPv4 link remote: [AF_INET]N.N.N.N:1194
Apr 16 05:01:15 openvpn[15266]: write UDPv4: Network is unreachable (code=101)

After this, the routing table is left in this state:

Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.20.0 * 255.255.255.0 U 0 0 0 LAN


Eventually, it recovers. No idea what triggers it, log:

Apr 15 06:26:31 dnsmasq[30517]: read /etc/hosts - 5 addresses
Apr 15 06:26:31 dnsmasq[30517]: using nameserver 8.8.4.4#53 for domain local
Apr 15 06:26:31 dnsmasq[30517]: using nameserver 8.8.8.8#53 for domain local
Apr 15 06:26:31 dnsmasq[30517]: using nameserver 8.8.8.8#53
Apr 15 06:26:31 dnsmasq[30517]: using nameserver 8.8.4.4#53
Apr 15 06:26:31 rc_service: udhcpc 20915:notify_rc start_firewall
Apr 15 06:26:31 dnsmasq[30517]: read /etc/hosts - 5 addresses
Apr 15 06:26:31 [same dnsmasq output]
Apr 15 06:26:32 start_nat_rules: apply the nat_rules(/tmp/nat_rules_eth0_eth0)!
Apr 15 06:26:32 wan: finish adding multi routes
Apr 15 06:26:32 rc_service: udhcpc 20915:notify_rc stop_upnp
Apr 15 06:26:32 rc_service: waitting "start_firewall" via udhcpc ...
Apr 15 06:26:32 openvpn[19423]: TLS: Initial packet from [AF_INET]N.N.N.N:1194, sid=d5a087ec 7528d23b
Apr 15 06:26:33 WAN Connection: WAN was restored.

If I manually do an [Apply] of WAN configuration, it looks like this:

Apr 15 21:01:11 start_nat_rules: apply the nat_rules(/tmp/nat_rules_eth0_eth0)!
Apr 15 21:01:11 wan: finish adding multi routes
Apr 15 21:01:11 rc_service: udhcpc 5183:notify_rc stop_upnp
Apr 15 21:01:11 rc_service: waitting "start_firewall" via udhcpc ...
Apr 15 21:01:12 WAN Connection: WAN was restored.
Normal routing table looks like this:

Destination Gateway Genmask Flags Metric Ref Use Iface
10.8.0.5 * 255.255.255.255 UH 0 0 0 tun11
10.8.0.1 10.8.0.5 255.255.255.255 UGH 0 0 0 tun11
192.168.2.1 * 255.255.255.255 UH 0 0 0 WAN
N.N.N.N 192.168.2.1 255.255.255.255 UGH 0 0 0 WAN
192.168.20.0 * 255.255.255.0 U 0 0 0 LAN
192.168.2.0 * 255.255.255.0 U 0 0 0 WAN
default 10.8.0.5 128.0.0.0 UG 0 0 0 tun11
128.0.0.0 10.8.0.5 128.0.0.0 UG 0 0 0 tun11
default 192.168.2.1 0.0.0.0 UG 0 0 0 WAN

I would like to have a workaround. At the simplest, since power cycles occur at predefined times, I could schedule a script to run to restart WAN at those times +1 minute.

However, my problem is: I don't know what to put in the script to trigger the same effect as applying the WAN setting, basically how to trigger a "proper" WAN restart.

Of course, a more intelligent solution is welcome.

Please note that since I do not control the WAN equipment, I cannot live-test it more than 1-3 times per day. Also, sorry in advance for noobishness.
 
Last edited:

RMerlin

Asuswrt-Merlin dev
Two things I would try first:

1) Make sure that DHCP mode is set to "Normal" rather than "Aggressive" on the WAN page
2) Try putting a small network switch between the router's WAN port and the upstream equipment, to see if preventing the multiple Ethernet link flaps (I see two "Ethernet WAN is up" in that log piece you posted) might help the router recovering from the upstream reboot

Otherwise, a cron job to restart the WAN would probably look like this (untested, I'm not at home at the moment):

Code:
#!/bin/sh

service restart_wan_if 0
 

Alexander K

New Around Here
1) It's already on Normal, and I tried Aggressive too. Does not help either way.
2) I might try that, but adding another piece of hardware would be a last resort.

Your command, sadly, fails.
Code:
# service restart_wan_if 0
Usage: service <action_service>

I tried it without 0, the command is acknowledged but does nothing.
 

vaszago

Regular Contributor
I have a script who is working moderately for me

#!/bin/sh
if ping -w 10 -c 1 212.247.8.38 > /dev/null; then
echo "SERVERNAME=on"
else
echo "SERVERNAME=off"
killall -SIGUSR2 udhcpc
fi

In addition, you still need a services-start script.
He also performs at a restart the script

#!/bin/sh
/usr/sbin/cru a usb-start "*/5 * * * * /jffs/scripts/usb-start"

therefore he checks every 5 minutes the goggle ip. If down then starts the wan interface
 

Alexander K

New Around Here
If I try to use SIGUSR2 on udhcpc, it fails:

Code:
Apr 20 22:30:25 dnsmasq[825]: read /etc/hosts - 5 addresses
Apr 20 22:30:25 dnsmasq[825]: using nameserver 8.8.4.4#53 for domain local
Apr 20 22:30:25 dnsmasq[825]: using nameserver 8.8.8.8#53 for domain local
Apr 20 22:30:25 dnsmasq[825]: using nameserver 8.8.8.8#53
Apr 20 22:30:25 dnsmasq[825]: using nameserver 8.8.4.4#53
Apr 20 22:30:29 WAN Connection: ISP's DHCP did not function properly.

And then it sits there without trying to re-establish.

A possibly significant detail: there is a MAC filter on the upstream and I have a MAC manually set in WAN options.
 

RMerlin

Asuswrt-Merlin dev
1) It's already on Normal, and I tried Aggressive too. Does not help either way.
2) I might try that, but adding another piece of hardware would be a last resort.

Your command, sadly, fails.
Code:
# service restart_wan_if 0
Usage: service <action_service>

I tried it without 0, the command is acknowledged but does nothing.

Try this instead (I just tested it remotely from work and it works for restarting the WAN interface):

Code:
service restart_wan
 

Alexander K

New Around Here
Thanks, looks like this will work.

It does complain "WAN Connection: ISP's DHCP did not function properly." but proceeds with re-establishing and eventually gets a lease in a few seconds.

I'll see if it will do the job if scheduled roughly a minute after the upstream switch comes online.

For the record, where can one see all the commands service can take?
 
Last edited:

RMerlin

Asuswrt-Merlin dev
Thanks, looks like this will work.

It does complain "WAN Connection: ISP's DHCP did not function properly." but proceeds with re-establishing and eventually gets a lease in a few seconds.

I'll see if it will do the job if scheduled roughly a minute after the upstream switch comes online.

For the record, where can one see all the commands service can take?

They're not documented, because they are meant for internal use by the firmware, and can change at any time. You have to dig through the source code to see the (currently) available functions.

https://github.com/RMerl/asuswrt-merlin/blob/master/release/src/router/rc/services.c#L4142
 

mstombs

Very Senior Member
Try this instead (I just tested it remotely from work and it works for restarting the WAN interface):

Code:
service restart_wan

Interesting. I use (and have just tested it still works)

Code:
service wan_restart

I don't see either string in the linked code - I guess you tokenize it and and in both cases notice "wan" and "restart"?
 

RMerlin

Asuswrt-Merlin dev
Interesting. I use (and have just tested it still works)

Code:
service wan_restart

The action has to be at the start of the string. The router only checks for the first "x" characters when looking for start, restart or stop actions. wan_restart shouldn't work.[/QUOTE]
 

mstombs

Very Senior Member
Ah OK, it doesn't error - it acknowledges the request

Code:
ASUSWRT-Merlin RT-N66U_3.0.0.4 Sun Apr  5 18:58:40 UTC 2015
[email protected]:/tmp/home/root# service wan_restart

Done.
Apr 21 20:54:20 rc_service: service 17034:notify_rc wan_restart

... but doesn't actually do anything!
 

RMerlin

Asuswrt-Merlin dev
Ah OK, it doesn't error - it acknowledges the request

Code:
ASUSWRT-Merlin RT-N66U_3.0.0.4 Sun Apr  5 18:58:40 UTC 2015
[email protected]:/tmp/home/root# service wan_restart

Done.
Apr 21 20:54:20 rc_service: service 17034:notify_rc wan_restart

... but doesn't actually do anything!

The service command merely queues the request, it has no way of knowing if the request is a valid one or not. The rc daemon will know once it processes the queue.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top