What's new

Strange WAN (half?)disconnections on RT-N66U resulting in DNS problems

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

emanuil_tolev

New Around Here
RT-N66U running latest Merlin (374.43).

Code:
... several hours of the fibre modem timing out, my ISP is/was having issues ...
Jul 12 06:24:10 pppd[2684]: Timeout waiting for PADO packets
Jul 12 06:25:25 pppd[2684]: Timeout waiting for PADO packets
Jul 12 06:26:00 pppd[2684]: PPP session is 122 (0x7a)
Jul 12 06:26:00 pppd[2684]: Connected to 7c:20:64:e5:65:6c via interface eth0
Jul 12 06:26:00 pppd[2684]: Using interface ppp0
Jul 12 06:26:00 pppd[2684]: Connect: ppp0 <--> eth0
Jul 12 06:26:06 pppd[2684]: CHAP authentication succeeded
Jul 12 06:26:06 pppd[2684]: CHAP authentication succeeded
Jul 12 06:26:06 pppd[2684]: peer from calling number 7C:20:64:E5:65:6C authorized
Jul 12 06:26:07 miniupnpd[16332]: Failed to get IP for interface ppp0
Jul 12 06:26:07 miniupnpd[16332]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping
Jul 12 06:26:07 miniupnpd[16332]: Failed to get IP for interface ppp0
Jul 12 06:26:07 miniupnpd[16332]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping
Jul 12 06:26:07 miniupnpd[16332]: Failed to get IP for interface ppp0
Jul 12 06:26:07 miniupnpd[16332]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping
Jul 12 06:26:07 pppd[2684]: local  IP address 84.92.99.210
Jul 12 06:26:07 pppd[2684]: remote IP address 195.166.130.138
Jul 12 06:26:07 pppd[2684]: primary   DNS address 212.159.6.10
Jul 12 06:26:07 pppd[2684]: secondary DNS address 212.159.6.9
Jul 12 06:26:07 dnsmasq[19149]: read /etc/hosts - 5 addresses
Jul 12 06:26:07 dnsmasq[19149]: using nameserver 212.159.6.10#53
Jul 12 06:26:07 dnsmasq[19149]: using nameserver 212.159.6.9#53
Jul 12 06:26:07 start_nat_rules: apply the nat_rules(/tmp/nat_rules_ppp0_eth0)!
Jul 12 06:26:07 dnsmasq[19149]: exiting on receipt of SIGTERM
Jul 12 06:26:07 dnsmasq[19261]: started, version 2.69 cachesize 1500
Jul 12 06:26:07 dnsmasq[19261]: asynchronous logging enabled, queue limit is 5 messages
Jul 12 06:26:07 dnsmasq-dhcp[19261]: DHCP, IP range 192.168.2.2 -- 192.168.2.254, lease time 1d
Jul 12 06:26:07 dnsmasq-dhcp[19261]: DHCP, sockets bound exclusively to interface br0
Jul 12 06:26:07 dnsmasq[19261]: read /etc/hosts - 5 addresses
Jul 12 06:26:07 dnsmasq[19261]: using nameserver 212.159.6.10#53
Jul 12 06:26:07 dnsmasq[19261]: using nameserver 212.159.6.9#53
Jul 12 06:26:07 rc_service: ip-up 19243:notify_rc stop_upnp
Jul 12 06:26:07 miniupnpd[16332]: shutting down MiniUPnPd
Jul 12 06:26:07 rc_service: ip-up 19243:notify_rc start_upnp
Jul 12 06:26:07 miniupnpd[19277]: HTTP listening on port 41936
Jul 12 06:26:07 miniupnpd[19277]: Listening for NAT-PMP/PCP traffic on port 5351
Jul 12 06:26:07 ddns: IP address, server and hostname have not changed since the last update.
Jul 12 06:26:09 WAN(0) Connection: WAN was restored.

So far, so good, the connection came back up and the router reconnected. There are usually no issues after this.

However, *the following* seems to happen once or twice per day (this is literally continuing the log excerpt, nothing omitted - 30 min after the connection started working as you can see):

Code:
Jul 12 06:58:58 WAN(0) Connection: Fail to connect with some issues.
Jul 12 06:58:58 stop_nat_rules: apply the redirect_rules!
Jul 12 06:59:04 dnsmasq[19261]: exiting on receipt of SIGTERM
Jul 12 06:59:04 dnsmasq[21282]: started, version 2.69 cachesize 1500
Jul 12 06:59:04 dnsmasq[21282]: asynchronous logging enabled, queue limit is 5 messages
Jul 12 06:59:04 dnsmasq-dhcp[21282]: DHCP, IP range 192.168.2.2 -- 192.168.2.254, lease time 1d
Jul 12 06:59:04 dnsmasq-dhcp[21282]: DHCP, sockets bound exclusively to interface br0
Jul 12 06:59:04 dnsmasq[21282]: read /etc/hosts - 5 addresses
Jul 12 06:59:04 dnsmasq[21282]: using nameserver 212.159.6.10#53
Jul 12 06:59:04 dnsmasq[21282]: using nameserver 212.159.6.9#53

There's no DNS after this. You can connect to the outside world via IP (e.g. skype works, you can ping servers by ip), but all DNS queries return 10.0.0.1 . The fibre modem is working as far as I can tell (no PADO timeouts as you can see). It seems to be a completely local problem. Anyway, this just stays broken until...

Code:
Jul 12 07:43:39 dnsmasq-dhcp[21282]: DHCPREQUEST(br0) 192.168.2.105 6c:ad:f8:10:fe:23 
Jul 12 07:43:39 dnsmasq-dhcp[21282]: DHCPACK(br0) 192.168.2.105 6c:ad:f8:10:fe:23 android-fcba21d62dd495a4
Jul 12 07:47:12 dnsmasq-dhcp[21282]: DHCPREQUEST(br0) 192.168.2.194 4c:80:93:04:13:ac 
Jul 12 07:47:12 dnsmasq-dhcp[21282]: DHCPACK(br0) 192.168.2.194 4c:80:93:04:13:ac tokidoki
Jul 12 07:48:29 rc_service: httpd 326:notify_rc restart_wan_if 0
Jul 12 07:48:29 dnsmasq[21282]: read /etc/hosts - 5 addresses
Jul 12 07:48:29 dnsmasq[21282]: using nameserver 212.159.6.10#53
Jul 12 07:48:29 dnsmasq[21282]: using nameserver 212.159.6.9#53
Jul 12 07:48:29 pppd[2684]: Hangup (SIGHUP)
Jul 12 07:48:29 pppd[2684]: Terminating on signal 15
Jul 12 07:48:29 pppd[2684]: Connect time 82.4 minutes.
Jul 12 07:48:29 pppd[2684]: Sent 190895 bytes, received 151409 bytes.
Jul 12 07:48:29 dnsmasq[21282]: read /etc/hosts - 5 addresses
Jul 12 07:48:29 dnsmasq[21282]: using nameserver 212.159.6.10#53
Jul 12 07:48:29 dnsmasq[21282]: using nameserver 212.159.6.9#53
Jul 12 07:48:29 miniupnpd[19277]: Failed to get IP for interface ppp0
Jul 12 07:48:29 miniupnpd[19277]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping
Jul 12 07:48:29 pppd[2684]: Connection terminated.
Jul 12 07:48:29 pppd[2684]: Sent PADT
Jul 12 07:48:30 pppd[2684]: Exit.
Jul 12 07:48:30 stop_wan(): perform DHCP release
Jul 12 07:48:30 kernel: Attempt to kill tasklet from interrupt
Jul 12 07:48:30 kernel: br0: port 1(vlan1) entering disabled state
Jul 12 07:48:30 kernel: br0: port 1(vlan1) entering learning state
Jul 12 07:48:30 kernel: br0: topology change detected, propagating
Jul 12 07:48:30 kernel: br0: port 1(vlan1) entering forwarding state
Jul 12 07:48:41 pppd[21882]: Plugin rp-pppoe.so loaded.
Jul 12 07:48:41 pppd[21882]: RP-PPPoE plugin version 3.11 compiled against pppd 2.4.5
Jul 12 07:48:41 pppd[21883]: pppd 2.4.5 started by CanfigureMahRutah, uid 0
Jul 12 07:48:41 pppd[21883]: PPP session is 128 (0x80)
Jul 12 07:48:41 pppd[21883]: Connected to 7c:20:64:e5:65:6c via interface eth0
Jul 12 07:48:41 pppd[21883]: Using interface ppp0
Jul 12 07:48:41 pppd[21883]: Connect: ppp0 <--> eth0
Jul 12 07:49:01 pppd[21883]: CHAP authentication succeeded
Jul 12 07:49:01 pppd[21883]: CHAP authentication succeeded
Jul 12 07:49:01 pppd[21883]: peer from calling number 7C:20:64:E5:65:6C authorized
Jul 12 07:49:02 pppd[21883]: local  IP address 84.92.99.210
Jul 12 07:49:02 pppd[21883]: remote IP address 195.166.130.139
Jul 12 07:49:02 pppd[21883]: primary   DNS address 212.159.6.9
Jul 12 07:49:02 pppd[21883]: secondary DNS address 212.159.6.10
Jul 12 07:49:03 dnsmasq[21282]: read /etc/hosts - 5 addresses
Jul 12 07:49:03 dnsmasq[21282]: using nameserver 212.159.6.9#53
Jul 12 07:49:03 dnsmasq[21282]: using nameserver 212.159.6.10#53
Jul 12 07:49:03 start_nat_rules: apply the nat_rules(/tmp/nat_rules_ppp0_eth0)!
Jul 12 07:49:03 dnsmasq[21282]: exiting on receipt of SIGTERM
Jul 12 07:49:03 dnsmasq[21907]: started, version 2.69 cachesize 1500
Jul 12 07:49:03 dnsmasq[21907]: asynchronous logging enabled, queue limit is 5 messages
Jul 12 07:49:03 dnsmasq-dhcp[21907]: DHCP, IP range 192.168.2.2 -- 192.168.2.254, lease time 1d
Jul 12 07:49:03 dnsmasq-dhcp[21907]: DHCP, sockets bound exclusively to interface br0
Jul 12 07:49:03 dnsmasq[21907]: read /etc/hosts - 5 addresses
Jul 12 07:49:03 dnsmasq[21907]: using nameserver 212.159.6.9#53
Jul 12 07:49:03 dnsmasq[21907]: using nameserver 212.159.6.10#53

... I go in the settings and hit "apply" on the WAN page (not currently configured as Dual-WAN, though it was before). This seems to restart the WAN connection, reload pppd and so forth. DNS and IP connectivity both work now.

I was previously on the recent Merlin 374.34_2 release and a few days ago upgraded to Merlin 374.43 when this started happening.

It's quite a strange problem and I'm not entirely sure it's the router's fault, since my ISP has had a lot of problems (no DSL at all for hours). However, since restarting the WAN (which takes just ~5s) seems to fix it, it does look like there's no DNS availability even after my ISP brings the connection back up on their end, and it does seem like the router's having a problem. I'd be happy if it just tried to reconnect upon this "failure", the same way it does when the cable modem *actually* fails.

I'd be happy to just cause it to restart the WAN and reload pppd on that "WAN(0): Failed to connect with some issues" event, but couldn't find a user script for the WAN going down - you can write one which fires when WAN goes up, but not down.

I've also been unable to find out how to restart the WAN from the busybox command line you get with ssh (i.e. do what the web UI does, but through ssh). If anybody knows how to do this that'd be awesome, as I could then at least script my machine to restart the WAN when this situation occurs (IP-only, no DNS).

Thanks a lot for your hard work on this firmware, @RMerlin, it is lovely.
 

Similar threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top