What's new

Diagnosing frequent dropouts

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

ADFHogan

Regular Contributor
Short version:
Am noticing my internet connection drop out roughly every 4 hours, fairly consistently, but nothing obvious is popping out as a cause.
ISP doesn't register every dropout, nor does the modem, suggesting that not all situations where the router detects no internet are actually no internet.
Most of the time, outage is seen as very momentary drop - long enough for persistent connections to drop.
Samknows monitoring appliance behind the router shows most dropouts as only lasting a few seconds.
Have bumped up the logging level just now, and am wondering what I can do to diagnose the issue.

More detail:
Running an RT-AC5300 with firmware version 386.5 in a dirty upgrade from 386.4. USB2 USB stick attached, with USB port in USB2.0 mode.

On this USB, have:
  • Diversion v4.2.2
  • Skynet v7.2.8
  • scribe v2.4.3
  • nsrum v30.4.0
  • scMerlin v2.4.0
  • YazDHCP v1.0.4
... installed. I used to have dnscrypt installed, but it went a bit screwy, so I removed it.

Whilst in bridged mode, found that AsusWRT Merlin didn't seem to play nice with ISP DHCP when modem in bridge mode, with even more frequent dropouts noted, even when trying all of:
  • Normal
  • Aggressive
  • Continuous
(... as an aside, is it just me, or is wanduck reeeeeally vague about when it sees a problem?)

Currently in a Double NAT scenario with ISP modem. On the upside I can now query the ISP modem's status page.

I haven't seen anything obvious in /opt/var/log/messages. I've just rebooted my router with log levels:
  • message_loglevel=7 (debug)
  • log_level=8 (everything)
  • console_loglevel=5 (important stuff - though arguably I'm never really connected to the console)
I have had the combination of:
  • ISP (another brand of same ISP at a different address admittedly, with a static IP assigned via DHCP, now fully dynamic)
  • ISP modem (in bridge mode)
  • RT-AC5300
  • Samknows appliance
  • Unifi switch
... working stably and without error at a previous address for quite some time, the main difference being I wasn't using the inbuilt wifi of the RT-AC5300, and I am presently using an RT-AC3200 as a media bridge to the RT-AC5300.
The temporary address where I am now, however, it's easier using inbuilt wifi than trying to get the Unifi APs set up temporarily.

Any ideas?

Samknows dropout monitoring:
1647350533311.png


ISP records (I think they're a few hours out, due to timezone difference from ISP head office):
1647350968603.png


I have had the last mile provider out to the premises, and they've advised there is some wiring work to do, but that I would think would be reflected in the ISP records which are but a fraction of the Samknows detections.

Where to start?
 
Dirty upgrade, over half a dozen scripts, and at least one were fidgeted with and eventually removed?

The fastest way to a stable and good/known state for your router is a full reset at this point.

If someone doesn't come in with a working 'fix' for you in the next few hours, that is where I'd be spending my network time to eliminate this issue.

Save whatever current settings/data you have on the router and USB drive, but I would start with a formatted drive (before plugging it into the router again) and a minimally and manually configured router to connect to your ISP and secure your network.

Do not use any saved backup config files. Do not 'blindly' use any settings, scripts, or options that worked 'once upon a time'.

Do read the changelog(s) and note the new direction that many things have taken with the new firmware.
 
I understand that I can blow it all away and start from scratch again (though admittedly keeping the DHCP list [a CSV file, not a complex config] exported from YazDHCP because rekeying that many MAC addresses is a pain in the arse), but before I do that, I wanted to know if there was some way of figuring out what's happening.

Eg. I'm not a fan of "mystery fails" :)

It just seems unusual that things could just fail silently every 4 hours, without seeing any kind of messaging around it. I wondered if the timeframe involved suggested something in particular. I am thinking given the brevity of the bulk of dropouts it's probably DNS related (causing internet detection faults), but I thought that surely there should be some sort of error popping up?

I likely will try wiping it eventually, but wanted to see how far I might get before that.
 
I've now reset the router completely, wiping the JFFS, the NVRAM, the USB stick that was attached, and loading nothing back except for the list of DHCP reservations... fingers crossed..
 
Ok.. I wiped and reset then installed a base number of packages:
  • Diversion v4.2.2 - ad blocking - in "Lite" mode
  • Skynet v7.2.8 - security - block hits from known bad IPs
  • scribe v2.4.3 - add a proper syslog so it's not wiped when rebooted, or hitting the JFFS
  • scMerlin v2.4.0 - so I can restart services easily
  • YazDHCP v1.0.4 - so I don't have to keep retyping my DHCP reservations by hand, without having to reload my entire config
My internet connection is DHCP (connected to ISP Modem Router currently still doing its own NAT).
I have overridden the default DNS servers with Cisco OpenDNS.
Tri-band SmartConnect is disabled (it doesn't tend to play nice with Apple devices)
... each of the three radios has its own unique SSID.

Modem line quality info:
DSL Type
VDSL2
DSL Mode
Fast
Maximum Line rate
10.2 Mbps 52.6 Mbps
Line Rate
10.2 Mbps 50.52 Mbps
Output Power
6.3 dBm 14.5 dBm
Line Attenuation
9.6, 47.4,N/A dB 21.6, 58.2, 87.3 dB
Noise Margin
5.5 dB 6.9 dB

Still seeing dropouts that were not seen by ISP modem-router, but at a reduced frequency.

There is nothing in /opt/var/log/messages at the time these dropouts occur

I am wondering if maybe the USB stick I'm using is "blocking", and needs to perform some IO operation (such as dealing with swap) that ties up the kernel for long enough that it's seen as a dropout? I'd expect something in the logs, but perhaps that's not seen because the logs are on the USB too? Maybe it's wearing out? It's a Lexar JD Firefly
 
Ok.. I've reset the router again, swapped out the USB stick, and cut back the packages installed to:
  • Diversion
  • Skynet
  • scMerlin
  • YazDHCP
... and am still seeing dropouts roughly every 4 hours that are not reflected in ISP or modem status.
 
... and am still seeing dropouts roughly every 4 hours that are not reflected in ISP or modem status.
It's possible that the differences you're seeing between the number or frequency of disconnections reported by your ISP/modem vs your Samknows appliance are due to a very different sampling rate or interval that each of the devices is using to detect a "disconnection event."

IOW, you might not actually be comparing "apples to apples" in your particular scenario.

Just a thought.
 
Last edited:
Yeah - I don't think I have the full picture.

I'm running an RT-AC3200 as a media bridge to the RT-AC5300.
To try and rule out that causing problems, I've taken the RT-AC3200 back to stock firmware, and reset its settings.

Maybe this will fix things.
 
Yeah - I don't think I have the full picture.

I'm running an RT-AC3200 as a media bridge to the RT-AC5300.
To try and rule out that causing problems, I've taken the RT-AC3200 back to stock firmware, and reset its settings.

Maybe this will fix things.

First steps in troubleshooting is never assume the worst, start from square one. You say your internet is dropping, this assumption cannot be proven until we see data being streamed from each point of the network. Doing a continuous ping from starting device to the router, then to the next device, next device, and so on. Continuous ping stats matter, response times and especially dropped packets. If you do a continuous ping for a hundred packets or so, and you see one packet drop, then that could be an issue with a wifi card or a cable, but you'll never know which one it is until you isolate the point of failure. If you see packet times going up and down drastically and consistently that can be another sign of hardware failure. I'm not sure on your configuration but another thing to consider is device age and usage, that's including cables. This can be a typical sign of a bad cable, everything looks fine but then interference can occur with electrical noise either by air conditioners or even cross-talk in the wire themselves. But as I said, I don't know your system configuration in terms of network layout, age, usage, etc all the way from the device you're using to the cable drop to the ISP. These are variables you need to consider in order to properly isolate your problem.
 
It's possible that the differences you're seeing between the number or frequency of disconnections reported by your ISP/modem vs your Samknows appliance are due to a very different sampling rate or interval that each of the devices is using to detect a "disconnection event."

IOW, you might not actually be comparing "apples to apples" in your particular scenario.

Just a thought.
I take and appreciate your point about apples to apples a bit more clearly now, thanks.

The places where I saw dropout behaviour
  • ISP portal - order of days
  • Modem Router status page - uptime - order of days
  • Samknows portal - every 4 hours
  • Computer connected to RT-AC3200 in wireless bridge - most frequent
So you're right - multiple issues at play.

The most annoying problem, where the computer connected to the wireless bridge would lose its connection to everything for 30-60s (presumably this is a feature - to make an ethernet connected device re-poll for DHCP by making it appear as if cable is unplugged when WiFi fails) was fixed by moving the RT-AC3200 to the latest available Asus stock firmware 3.0.0.4.382.52545, away from the EoL Merlin release 384.13_10. This has significantly improved its behaviour as a media bridge.

The next level, the Samknows outages, I think I'll contact Samknows about and ask them how they're detecting internet as being unavailable. It might be something like DNS being momentarily unavailable.

The ISP outages used to be a lot more frequent due to the joys of NBN FttN, but have been improved with a visit from a technician to replace corroded connections on the last mile.
 
I've contacted Samknows, and got the following response:

This test is an optional extension to the UDP Latency/Loss test.

It records instances when two or more consecutive packets are lost to the same test server. Alongside each event we record the timestamp, the number of packets lost and the duration of the event.

By executing the test against multiple diverse servers, a user can begin to observe server outages (when multiple probes see disconnection events to the same server simultaneously) and disconnections of the user's home connection (when a single probe loses connectivity to all servers simultaneously).

Typically, this test is accompanied by a significant increase in the sampling frequency of the UDP Latency/Loss client to ~2000 packets per hour (providing a resolution of 2- 4 seconds for disconnection events).

We have an ongoing investigation into an issue that can make a unit appear disconnected even though it’s not which we believe is the cause of what you saw.

We’ll fix this as soon as we’ve identified the root cause,

Sorry for the inconvenience,

So.. it seems like I can treat the very short disconnection reports from Samknows monitoring appliance as just UDP packet loss. Still a nifty feature for having a client-side idea of outages, though it sounds like they have some tuning to do on their dashboard with the raw data. I don't want this to sound like a criticism of them - they've been really helpful, and in conjunction with Australia's ACCC watchdog, have been helping to keep the local ISP market honest.

Bottom line: It sounds like the RT-AC3200 running last Merlin whilst in media bridge mode is what caused the bulk of my problems.
 
Similar threads
Thread starter Title Forum Replies Date
L Wifi issues, where to start diagnosing Asuswrt-Merlin 18

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top