What's new

Debugging internet connectivity issue with one client

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

TheScotsman

Occasional Visitor
I'm running Asuswrt-merlin 388.1 on a GT-AXE11000 and am trying to debug an intermittent connectivity issue with my wife's Windows 10 laptop. Initially I thought she was losing the wireless connection, or that hunting was taking place between the main router and my AiMesh node, or across frequencies - so the AiMesh node is powered down, and I set different SSIDs on each frequency to ensure she's only connecting to the one band, and I cranked the logging up to "debug" level. From what I've observed today, she's not losing wireless - in fact, she doesn't appear to be losing all connectivity. With two windows open running continuous pings (one to an internal server, one to Google), when her connectivity locks up it's only the internet traffic failing - the pings to google stop responding, and all her web/streaming/O365/etc. traffic freezes, but the pings to the internal server continue. Nothing shows out of the ordinary in the syslog, but on the "Connections" tab I can see a ton of connections in "SYN_RECV" state, as well as many UNREPLIED UDP (all DNS in the one snapshot I'm looking at), as well as a few TCP in ESTABLISHED, CLOSE, or FIN_WAIT and several UDP in ASSURED. When she hangs, pings I have running from other wireless and wired devices are all still going through fine, so the issue seems limited to her machine. Dropping/reconnecting wireless on her machine clears it right up.

On the router, parental controls & QoS are currently off; AiProtection is on but not complaining it has blocked anything; Firewall and DoS protection are enabled (no inbound rules set on the firewall). With the SYN_WAIT and UNREPLIED packets in the connections list, I'm wondering if something on the router might be intermittently blocking return traffic to her machine - if DoS protection fired, would I see that logged anywhere? I've enabled logging of dropped packets, but am apparently not caffeinated enough right now to decode what I'm seeing to determine if the dropped packets are actually destined for her machine (can that be identified from the "it looks way too long for a MAC to me" MAC field?)

Thanks for any pointers, tips, or suggestions!
 
Try updating to the latest wireless drivers for her computer. Is she using the same DNS as everyone else? Also check her default route. Try an AV scan.

Good luck,

Morris
 
Wifi drivers first on list for checking , could also be wireless card is dying.
 
Thanks Morris - her wireless drivers are up-to-date, ipconfig /all shows the same DNS as everyone else (and the connection log bears out that she's using it), and her default route is correct. She's got automatic AV scans running and none have picked anything up yet, but I'll run a deeper scan to see if it notes anything.

AndreiV, I agree on the wifi card and drivers, had checked those first - but it's only her internet traffic being impacted, access to local servers and printers in the house still works when she's hung. However, I did notice that when she was hung up, ipconfig wouldn't run either - it ALSO hung and didn't return any data; really makes me suspect her laptop and not the router or our internet connection (plus, no hangs on any other systems wired or wireless when she locks up).
 
Thanks Morris - her wireless drivers are up-to-date, ipconfig /all shows the same DNS as everyone else (and the connection log bears out that she's using it), and her default route is correct. She's got automatic AV scans running and none have picked anything up yet, but I'll run a deeper scan to see if it notes anything.

AndreiV, I agree on the wifi card and drivers, had checked those first - but it's only her internet traffic being impacted, access to local servers and printers in the house still works when she's hung. However, I did notice that when she was hung up, ipconfig wouldn't run either - it ALSO hung and didn't return any data; really makes me suspect her laptop and not the router or our internet connection (plus, no hangs on any other systems wired or wireless when she locks up).

Can she ping the router when the internet doesn't ping? Are you just running plain ipconfig or doing /renew? If trying to renew and it is hanging, that could be because the router isn't responding to her.

Try disabling all other network devices, even bluetooth, so the wireless card is the only thing active. Ipconfig hanging could mean one of those other devices is freezing up the network stack. If she has a wired NIC try connecting that for a while (disabling wifi and re-enabling wired) and see if the problem still exists. Or try a USB nic (either wired or wifi). Just to try and narrow it down. When ipconfig freezes, it is either the device it last displayed (if it doesn't give a full output or say no action can be completed) or the one after it (if it finishes displaying the previous device). Or to put it another way, you can tell from ipconfig if it is freezing up on the currently displayed device or freezing when attempting to go to the next device based on what the output looks like. To see what comes after it compare it to an ipconfig when it is working, and that is the device you likely need to focus on.

Are you running IPv6? You mentioned no inbound firewall rules so that makes me think you may be. So possibly it is having issues with IPv6 to the internet but IPv4 to LAN is working ok (or maybe even IPv6 to LAN).

Just some thoughts of things to check.
 
Hello @drinkingbird, thanks for jumping in on this. She can ping the router and other internal devices when the internet doesn't ping (I have her running two command prompts with constant pings to the router and to google - can watch Google pings fail when Zoom, O365, etc. lock up on her, while the router pings keep merrily running). Sometimes it will free up on its own, but often requires a recycle on the wireless. On the ipconfig issue, it was just "ipconfig" (no parameters/options) that hung.

I did NOT think about bluetooth being the potential issue, that's a good suggestion - I also have a USB nic I can throw her on, help narrow down if it's hardware or something in her config.

I'm not running IPv6, by "no inbound firewall rules" I meant I haven't specified any but the default ruleset that Asuswrt-merlin sets up would be in effect. Since disabling DoS protection didn't affect her issue, I've switched it back on in this screenshot:

Screenshot 2023-02-24 091809.png
 
Hello @drinkingbird, thanks for jumping in on this. She can ping the router and other internal devices when the internet doesn't ping (I have her running two command prompts with constant pings to the router and to google - can watch Google pings fail when Zoom, O365, etc. lock up on her, while the router pings keep merrily running). Sometimes it will free up on its own, but often requires a recycle on the wireless. On the ipconfig issue, it was just "ipconfig" (no parameters/options) that hung.

I did NOT think about bluetooth being the potential issue, that's a good suggestion - I also have a USB nic I can throw her on, help narrow down if it's hardware or something in her config.

I'm not running IPv6, by "no inbound firewall rules" I meant I haven't specified any but the default ruleset that Asuswrt-merlin sets up would be in effect. Since disabling DoS protection didn't affect her issue, I've switched it back on in this screenshot:

Also make sure there is no VPN trying to auto connect or something like that. Or firewall on the PC detecting something it doesn't like and blocking. But I'd start with trying a different network card and/or disabling all other network devices (not just bluetooth, sometimes there are virtual interfaces etc too) to try and get an idea where to focus.

Could just be a faulty wireless card too, sometimes the simplest explanation is the best one.
 
in a command box on her computer do a "route print" when it's normal. Then do another when not working. Is there a difference? Please share the results either way
 
Sorry to be revisiting this thread with more information so late - short answer is I've resolved the problem, but I want to post what I discovered in case it helps someone else.

It turns out, it was not just a single device experiencing issues, my wife's laptop was just much more prone to it (probably because she was using constantly and with video conferencing traffic most of the day). I was able to "recreate" (as in "it kept happening", not "I could reliably trigger it") the problem from multiple devices: her Lenovo laptop, my Samsung phone, my Macbook Pro; in the end what I determined was that all wireless devices would briefly lose their internet connection, sometimes recovering on their own, sometimes requiring a drop/reconnect of the wifi on that device. There were some really interesting characteristics to this:
  1. The issue only affected wireless devices; wired devices never experienced a drop
  2. The issue would affect devices individually - that is, when one device was having a drop, other devices would still operate properly.
  3. The issue ONLY affected internet connectivity; intranet would still function properly (i.e. could ping the router and other in-house devices, file server still accessible, etc.).
  4. Video traffic (Zoom, streaming movies, etc.) seemed to be the most reliable trigger, possibly due to traffic volume
  5. No evidence of the issue in any of the logs on the router.
I setup tcpdump (via entware) on the router to start trying packet captures, in hopes of tracing down where it was glitching as it still seems to have been somewhere within the router itself (GT-AXE11100 running 388.1). My suspicion is that something was hitting a boundary - maybe a counter or something that would be specific to a given devices connection was hitting a limit? Hadn't seen anything in the few captures I'd taken, though - but it was often hard to get a capture just as things were going bad.

I decided to take the "start over" approach - the 388.2 beta1 release had dropped, so I did a complete factory reset, loaded up the beta, then reconfigured everything by hand - my thinking was that perhaps in my experimenting on 388.1 I'd corrupted something, or surfaced a weird boundary error in the code that others weren't seeing, so I wanted to start with a clean configuration. While doing the reconfiguration I had questions about one setting and stumbled on this thread which actually bears some similarities to what I was experiencing (most of those folks resolved by falling back to an earlier release); but GOOD NEWS: moving to 388.2 beta1, and doing a complete manual reconfiguration, seems to have fixed the issue. I was running clean and stable for about 36 hours on all my systems, and I just pulled my wife's laptop off it's wired-connection lifeline so we'll see how her machine does in the course of the next few hours (wish me luck/pray for me if it's not actually fixed! :p)

Will be turning on features like DDoS protection, QoS, etc. gradually as long as things stay stable, with a settings backup between each change so it's easy to roll back if problems start to surface. Not sure if it was the clean reconfig or the updated codebase that fixed it, but might be a path forward for anyone else having a similar issue.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top