What's new

Mysterious loss of connectivity

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Thanks. That makes sense. If it's relevant at all, the only way I could get the bridge mode to work is by setting DHCP on on the Huawei router BEFORE setting bridge mode to on. So I am wondering if when my ASUS router gets its WAN IP, it is actually just getting this from my modem, not from the ISP. I did notice that when I had the Huawei router in normal mode, it had WAN IP = X, and when I set it to bridge mode, that same WAN IP = X got set as my WAN IP on my ASUS router. If I navigate to 'devices' in my Huawei router GUI in bridge mode, it shows my Asus router as having IPv4 address: ‏10.4.131.245 (which I think is the WAN IP of my ASUS).
It sounds like that hypothesis may be at least plausible.

Could it also relate to something to do with ISP handing out IP to first MAC address it sees (modem)?
I doubt it. If your modem implements bridge mode as described above, the Asus probably never actually communicates with the ISPs DHCP server.

So if modem not correctly signalling new IP to router, then there is no way out but to have router periodically check connectivity and restart WAN on loss thereof?
If your modem implements bridge mode this way, I would suspect that the modem's DHCP functionality isn't ready when the Asus restarts the WAN. However, if that's the case setting your DHCP to 'continuous' should theoretically fix the problem.
 
I really appreciate your patience and insight!
If your modem implements bridge mode this way, I would suspect that the modem's DHCP functionality isn't ready when the Asus restarts the WAN. However, if that's the case setting your DHCP to 'continuous' should theoretically fix the problem.
Isn't continuous the fastest polling frequency? Wouldn't 'normal' be better because it gives more time between polling so more time to recover? I have seen lots of users indicate 'continuous' fixes problems along these lines. It has polling frequency 12Hz I believe. Does it just not stop and keep polling until it gets new address? Whereas 'normal' or 'aggressive' gives up after 3 attempts?
UPDATE: just came accross this:
Jack, what is Continuous Mode and why is it required with ABB?
Hi Trial Master,

The existing modes (Aggressive and Normal Mode) do have a retry limit if DHCP server is not responding in a set time frame to prevent ISP from blocking the line. Continuous Mode on the other hand do not have this limit. This is the major difference.

regards,

Jack Cheng, ASUS Australia
So continuous just keeps going at 12Hz - that sounds pretty aggressive - but if it is only polling my modem and not ISP who cares?

I am beginning to think we have cracked it. I had the default 'aggressive mode' set - and after 48 hour refresh the 3 (?) attempts are issued too quickly to the modem before it can react properly. 'Continuous' mode just keeps going until it gets new IP.

As an aside, would the 5s ping/DNS test associated with 'network monitoring' in wanduck be expected to have any day-to-day impact on ordinary usage for a connection?
 
Last edited:
Does it just not stop and keep polling until it gets new address? Whereas 'normal' or 'aggressive' gives up after 3 attempts?
That's my understanding. I think continuous just keeps asking for an address until it gets one. I think Asus flubbed the documentation as I sincerely doubt it performs a DHCP request 12 times per second.
 
@Lynx This is getting really frustrating now. You're asking the same questions over and over and over again. At the end of the day we don't know. It might be any number of things. Only you are in a position to diagnose the problem.
 
@Lynx This is getting really frustrating now. You're asking the same questions over and over and over again. At the end of the day we don't know. It might be any number of things. Only you are in a position to diagnose the problem.
Yes, sorry (I have an obsessive personality). But I think a lot of progress has been made now. And hopefully this thread will be useful for the many out there who experience this kind of issue (and there appear to be many, from searching on this forum).
Information like this:
The existing modes (Aggressive and Normal Mode) do have a retry limit if DHCP server is not responding in a set time frame to prevent ISP from blocking the line. Continuous Mode on the other hand do not have this limit. This is the major difference.

regards,

Jack Cheng, ASUS Australia
is not so easy to come by!
Thanks everyone (and especially ColinTaylor and sbsnb) for your patience and insight.
 
Last edited:
I think this log may evidence 'Network Monitoring' in wanduck restoring bad connection after 48-hous modem refresh. Don't suppose this gives any further insight?
Relevant portion here:
Full log here:
There seem to be two instances of:
Code:
Aug 16 04:38:31 custom_script: Running /jffs/scripts/dhcpc-event (args: deconfig)
And everything seems to restart a couple of times.
 
Last edited:
Lost internet connectivity again last night pending manual intervention. Very much a shame that this happens because this is a problem with core functionality resulting in total failure.

WAN status on router GUI showed 'disconnected', and that there was a WAN IP set, but lease showed 5 hours and so many minutes left?

Usual trigger- 48 hour ISP/modem refresh, which we know results in 'eth0' on the ASUS
router to go down and back up, but router ultimately loses connectivity until WAN is restarted. Modem log showing trigger:
Code:
2021-08-19 02:03:58 System Notice
WAN connection INTERNET_R_UMTS1:IPv4 disconnected
2021-08-19 02:03:59 System Notice
WAN connection INTERNET_R_UMTS1:IPv4 connected

@sbsnb in accordance with your sugestions, during the faulty state:
------------------------
- my modem on 192.168.8.1 was not accessible - I could not access the modem web GUI or ping the modem
- here is the routing table
:
default10.0.0.10.0.0.0UG
0​
0​
0​
WAN
10.0.0.0*255.0.0.0U
0​
0​
0​
WAN
10.0.0.1*255.255.255.255UH
0​
0​
0​
WAN
192.168.1.0*255.255.255.0U
0​
0​
0​
LAN
239.0.0.0*255.0.0.0U
0​
0​
0​
LAN
- and here is the first portion of a traceroute from the router to the modem:
Code:
traceroute to 192.168.8.1 (192.168.8.1), 30 hops max, 38 byte packets
1  10.34.87.157 (10.34.87.157)  2240.702 ms
- unfortunately syslog did not go back far enough.
------------------------

I do not understand the first line of the traceroute. Is that as expected given the routing table? What does that tell us?

Should 'network monitoring' not have fixed the problem? Does anyone have a good handle on what 'network monitoring' actually does? I thought it restarted WAN on detection of failure, but the 'wanduck.c' code is hard to follow. So that begs the question: should I leave it enabled or disabled?

Is the issue here to do with DHCP mode: 'aggressive' giving up after 3 attempts? The log in my previous post above showing successful reset shows two instances of:
Code:
Aug 16 04:38:31 custom_script: Running /jffs/scripts/dhcpc-event (args: deconfig)

Does that (or the surrounding syslog entries) show how many out of the 3 attempts were needed during that successful handling of the refresh?

I have tried setting DHCP 'continuous' mode now. As soon as I changed the DHCP mode to 'continuous' the internet connection came back up. I think that is because that triggers a WAN restart.

Any thoughts on the above? Hoping that we can get to the bottom of this. I am very grateful for the interest and insight so far.
 
Last edited:
Sorry, this has got very long now. Colin Taylor I looked through wanduck.c and it seems to me that your suggestion regarding Network Monitoring may be a good idea. I have set:
View attachment 35637
The wanduck.c code is very hard to follow but I think this gets called with the functions do_dns_detect() and do_ping_detect().
When you set the network monitoring then dns_probe gets set to '1':
Code:
admin@RT-AX86U-4168:/tmp/home/root# nvram show |grep -i dns_probe
dns_probe=1
dns_probe_content=131.107.255.255 112.4.20.71 fd3e:4f5a:5b81::1
dns_probe_host=dns.msftncsi.com
size: 71879 bytes (59193 left)
And also wandog_target (for ping probe) gets set:
Code:
admin@RT-AX86U-4168:/tmp/home/root# nvram show |grep -i wandog
wandog_delay=0
wandog_enable=1
wandog_fb_count=4
wandog_interval=5
wandog_maxfail=12
wandog_target=www.google.com
size: 71879 bytes (59193 left)
And it seems to be actually doing something every 5 seconds (presumably given wandog_interval=5):
Code:
admin@RT-AX86U-4168:/tmp/home/root# tcpdump -vpni tun11 |grep -i msftncsi.com
tcpdump: listening on tun11, link-type RAW (Raw IP), capture size 262144 bytes
    10.8.3.2.43853 > 185.228.168.168.53: 24973+ A? dns.msftncsi.com. (34)
    185.228.168.168.53 > 10.8.3.2.43853: 24973 1/0/0 dns.msftncsi.com. A 131.107.255.255 (50)
    10.8.3.2.51202 > 185.228.168.168.53: 46556+ A? dns.msftncsi.com. (34)
    185.228.168.168.53 > 10.8.3.2.51202: 46556 1/0/0 dns.msftncsi.com. A 131.107.255.255 (50)
    10.8.3.2.36507 > 185.228.168.168.53: 55641+ A? dns.msftncsi.com. (34)
    185.228.168.168.53 > 10.8.3.2.36507: 55641 1/0/0 dns.msftncsi.com. A 131.107.255.255 (50)
    10.8.3.2.50597 > 185.228.168.168.53: 29697+ A? dns.msftncsi.com. (34)
    185.228.168.168.53 > 10.8.3.2.50597: 29697 1/0/0 dns.msftncsi.com. A 131.107.255.255 (50)
^C20443 packets captured
20509 packets received by filter
0 packets dropped by kernel

admin@RT-AX86U-4168:/tmp/home/root#
admin@RT-AX86U-4168:/tmp/home/root# tcpdump -vpni tun11 |grep -i google.com
tcpdump: listening on tun11, link-type RAW (Raw IP), capture size 262144 bytes
    10.8.3.2.33411 > 185.228.168.168.53: 32861+ A? www.google.com. (32)
    10.8.3.2.33411 > 185.228.168.168.53: 34397+ AAAA? www.google.com. (32)
    185.228.168.168.53 > 10.8.3.2.33411: 32861 1/0/0 www.google.com. A 216.239.38.120 (48)
    185.228.168.168.53 > 10.8.3.2.33411: 34397 1/0/0 www.google.com. AAAA 2001:4860:4802:32::78 (60)
    10.8.3.2.39070 > 185.228.168.168.53: 59112+ A? www.google.com. (32)
    10.8.3.2.39070 > 185.228.168.168.53: 60648+ AAAA? www.google.com. (32)
    185.228.168.168.53 > 10.8.3.2.39070: 59112 1/0/0 www.google.com. A 216.239.38.120 (48)
    185.228.168.168.53 > 10.8.3.2.39070: 60648 1/0/0 www.google.com. AAAA 2001:4860:4802:32::78 (60)
^C4965 packets captured
4968 packets received by filter
0 packets dropped by kernel

So I guess this would detect broken WAN state and force a restart_WAN?
I would still really like to understand why this is necessary and what is ultimately breaking the connection.
I hate to resurrect a thread from August, but I just noticed something.
In NVRAM dns_probe_host is set to dns.msftncsi.com which is fine.
dns_probe_content is set to 131.107.255.255 112.4.20.71 plus an IPv6 addy.

Question: dns.msftncsi.com resolves to 131.107.255.255 and that's ok but why is 112.4.20.71 appended ? It resolves to China Telecom.
 
Question: dns.msftncsi.com resolves to 131.107.255.255 and that's ok but why is 112.4.20.71 appended ? It resolves to China Telecom.
Probably that for legal reason, Microsoft clients in China are pointed to a server located in China, while the rest of the world is pointed to some non-Chinese CDN provider like Akamai.
 
@RMerlin I wonder if the following 'maintain-wan-lease' script should be included and enabled by default in your releases to overcome problem that disconnect does not trigger release / renew:
Code:
#!/bin/bash

renew_wan_lease=0

ip monitor link dev eth0 | while read event; do

        logger "maintain-wan-lease detected eth0 event: "$event

        case $event in

        *'NO-CARRIER'* )
                if [ $renew_wan_lease -eq 0 ]; then
                        logger "maintain-wan-lease detected eth0 state change to: 'NO-CARRIER', so forcing udhcpc to release wan lease."
                        killall -SIGUSR2 udhcpc
                        renew_wan_lease=1
                fi
        ;;

        *'LOWER_UP'* )
                if [ $renew_wan_lease -eq 1 ]; then
                        logger "maintain-wan-lease detected eth0 state change from: 'NO-CARRIER' to: 'LOWER_UP', so forcing udhcpc to renew wan lease."
                        killall -SIGUSR1 udhcpc
                        renew_wan_lease=0
                fi
        ;;
        esac

done
Certain modems drop connection and without this release / renew the result is total loss of internet connectivity until restart.

The default behaviour can be verified by physically unplugging the eth0 cable and monitoring states. It takes too long for wanduck to do anything. You could pull out eth0 cable and connect to a different modem and it would not issue the requisite release / renew calls.
 
Last edited:
@RMerlin I wonder if the following 'maintain-wan-lease' script should be included and enabled by default in your releases to overcome problem that disconnect does not trigger release / renew:
If the firmware`s own WAN monitoring is really broken, then it needs to be fixed upstream, not hacked with a workaround script that may bring its own set of issues.
 
If the firmware`s own WAN monitoring is really broken, then it needs to be fixed upstream, not hacked with a workaround script that may bring its own set of issues.
TNX Eric Perhaps you could assist with this. A word from RMerlin is worth 1000 complaints from end users. ;)
Also, the legality you spoke of is just the government blocking access to NA servers. If ASUS wants to sell routers in China, then they better include a Chinese ip. So for those who do not live in China; it should be ok to delete the Chinese ip.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top