What's new

Router/Firmware Upgrade Cause Regular Crashes--debugging ideas?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

branweb

New Around Here
I've been running 384.6 on an RT-AC56U and all was well. This weekend I installed 386.2_4 on an RT-AC68U.

Everything worked fine at first, but after about 30 minutes, the devices connected to both the 2.4GHz and 5GHz bands were disconnected and couldn't reconnect. The machine connected to the router via ethernet could access the internet but couldn't reach the router at 192.168.1.1. The request would get no response. A reboot fixes the problem temporarily, but about 30 minutes later, the same thing happens again.

I tried a factory reset. I also took a look at the logs. Once I saw "TCP: time wait bucket table overflow", which I understand could be related to lots of incoming or outgoing requests. I ran netstat but didn't see any one device making an excessive number of requests. Also switching back to the RT-AC56U fixed this, and I've never encountered it prior to the upgrade.

I'd love to work through this and debug it, but I don't have a lot of ideas on what the general issue could be. Any pointers in the right direction?
 
What's tricky about that is that the problem makes the router is inaccessible, so once the problem happens, I can't see the log. I've tried just watching it til the problem occurs, but so far the only thing I've seen is that "bucket overflow" thing--and I haven't even seen that every time. It could be I'm just missing something in the logs. Would be ideal if I had an idea of what to look for there.
 
Even after a reboot the previous log entries should be still there. It's possible that the most recent entries may be buffered for a while so give it 10 minutes or so after the problem occurs for them to be written out before rebooting.
 
Ok narrowing the problem a bit. So far nothing useful in the logs (at log level debug)--the only post-crash stuff I see is:

Code:
May 24 02:11:57 rc_service: watchdog 32564:notify_rc stop_aae
May 24 02:11:57 rc_service: watchdog 32564:notify_rc start_mastiff
May 24 02:11:57 rc_service: waitting "stop_aae" via watchdog ...
May 24 02:11:59 Mastiff: init

Which seems part of normal operation.

What I'm observing is: reboot the router, connect various devices to wifi, everything works ok. If I disconnect one of those devices and try to reconnect, it can's seem to authenticate. This happens on both 5GHz and 2.4GHz networks with a pixel-3a, ipad, and a laptop. After a few failed attempts at reconnecting, the router becomes unreachable at its ip address and a reboot is required.

I've also noticed that compared to the 56U running 384.6, the 68U takes forever to boot and forever to assign an ip address. It also almost always displays the incorrect number of clients in the clients list. Also, after every power-down, the router system time resets to May 4 2018, then after a few mins syncs with the ntp server and corrects itself. Maybe normal but I thought it would have a system battery that would help it remember the time.
 
Last edited:
A few of other things I've tried
- rebooting the client device
- forgetting and re-creating the connection from the client advice
- deleting all dhcp leases as per these instructions and trying the above steps

All do nothing. Interestingly, internet through the ethernet connection works fine through all this, even when the router is unreachable.

found this thread, which describes a similar problem. Currently working through the recommendations there.
 
Last edited:
Also, after every power-down, the router system time resets to May 4 2018, then after a few mins syncs with the ntp server and corrects itself. Maybe normal but I thought it would have a system battery that would help it remember the time.
This is normal. There is no battery.
 
hmm well toggling the connection from closed -> open -> closed definitely works, though hardly seems like a long-term solution.

Since my phone is kind of a black box as far as logs are concerned, so I tried disconnecting/connecting to the wifi with my laptop. Here are the network manager logs in reverse chronological order:

Code:
May 24 08:44:57 thinkpad NetworkManager[369]: <warn>  [1621871097.7177] device (wlp3s0): Activation: (wifi) association took too long
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.9099] device (p2p-dev-wlp3s0): supplicant management interface state: associating -> associated
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.9098] device (wlp3s0): supplicant interface state: associating -> associated
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.8780] device (p2p-dev-wlp3s0): supplicant management interface state: authenticating -> associating
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.8779] device (wlp3s0): supplicant interface state: authenticating -> associating
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.8466] device (p2p-dev-wlp3s0): supplicant management interface state: scanning -> authenticating
May 24 08:44:55 thinkpad NetworkManager[369]: <info>  [1621871095.8465] device (wlp3s0): supplicant interface state: scanning -> authenticating
May 24 08:44:45 thinkpad NetworkManager[369]: <info>  [1621871085.0413] device (p2p-dev-wlp3s0): supplicant management interface state: disconnected -> scanning
May 24 08:44:45 thinkpad NetworkManager[369]: <info>  [1621871085.0412] device (wlp3s0): supplicant interface state: disconnected -> scanning
May 24 08:44:44 thinkpad NetworkManager[369]: <info>  [1621871084.9417] device (p2p-dev-wlp3s0): supplicant management interface state: associated -> disconnected
May 24 08:44:44 thinkpad NetworkManager[369]: <info>  [1621871084.9416] device (wlp3s0): supplicant interface state: associated -> disconnected
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.9173] device (p2p-dev-wlp3s0): supplicant management interface state: associating -> associated
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.9172] device (wlp3s0): supplicant interface state: associating -> associated
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.8788] device (p2p-dev-wlp3s0): supplicant management interface state: authenticating -> associating
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.8788] device (wlp3s0): supplicant interface state: authenticating -> associating
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.8471] device (p2p-dev-wlp3s0): supplicant management interface state: scanning -> authenticating
May 24 08:44:34 thinkpad NetworkManager[369]: <info>  [1621871074.8471] device (wlp3s0): supplicant interface state: scanning -> authenticating
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6692] device (p2p-dev-wlp3s0): supplicant management interface state: disconnected -> scanning
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6691] device (wlp3s0): supplicant interface state: disconnected -> scanning
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6540] Config: added 'psk' value '<hidden>'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6539] Config: added 'auth_alg' value 'OPEN'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6539] Config: added 'key_mgmt' value 'WPA-PSK WPA-PSK-SHA256 FT-PSK'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6539] Config: added 'bgscan' value 'simple:30:-70:86400'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6538] Config: added 'scan_ssid' value '1'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6538] Config: added 'ssid' value '68u_5G'
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6537] device (wlp3s0): Activation: (wifi) connection '68u_5G' has security, and secrets exist.  No new secrets needed.
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6530] device (wlp3s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 24 08:44:32 thinkpad NetworkManager[369]: <info>  [1621871072.6520] device (wlp3s0): state change: need-auth -> prepare (reason 'none', sys-iface-state: 'managed')
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4084] device (wlp3s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed')
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4084] device (wlp3s0): Activation: (wifi) access point '68u_5G' has security, but secrets are required.
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4079] device (wlp3s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4044] device (wlp3s0): set-hw-addr: reset MAC address to 18:26:49:20:78:DF (preserve)
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4012] device (wlp3s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4008] audit: op="connection-add-activate" uuid="bcf3aecf-716d-49cf-9ed6-6b5444090409" name="68u_5G" pid=984 uid=1000 resul>
May 24 08:44:21 thinkpad NetworkManager[369]: <info>  [1621871061.4007] device (wlp3s0): Activation: starting connection '68u_5G' (bcf3aecf-716d-49cf-9ed6-6b5444090409)
 
tried to work on this a little more at lunch...I guess the supplicant interface is just a state machine...when it connects successfully the state goes from associated -> 4way_handshake. When it fails it goes from associated -> disconnected. Given this and the toggle network from open to closed, could something be up the the router not allowing devices that have connected before to complete the handshake?

EDIT: just for the hell of it I turned off WPS. Seems to have fixed the issue :-|
 
update: issue isn't fixed. Disabling WPS lets wireless clients disconnect and reconnect, but after 10-12 hours or so all wifi clients lose connection and can't reconnect. Router gui also becomes inaccessible, so the toggle open/secure network hack isn't even available. Restarting the router also no good since it comes back up with no internet--so every time this happens I have to unplug the modem and router then plug them back in (in sequence). What a mess.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top