CharlieRutledge
New Around Here
Looking for ideas about where to go next figuring out what's happening. I don't mind buying new hardware, but prefer to have some indication that this is really the issue!
My configuration:
Anyway, any suggestions appreciated, thanks in advance!
My configuration:
- The primary node is an RT-AC68U running in AP mode. It sits behind a NetGear switch which sits behind a firewall/router running OpnSense behind an xfinity cable modem with 1G service down / 40 M up.
- I have three AiMesh nodes (including the primary), with sub nodes each wired backhaul to a LAN port on the primary node. Sub nodes are an RT-AC68U and an RT-AC66U B1. The sub 68U also has a few devices hardwired to its LAN ports.
- All the wired stuff is 1G over cat 5e or cat 6. AiMesh admin screen confirms 1G wired backhaul.
- 4 SSIDs, two on 2.4GHz and two on 5 GHz (one each internal and guest).
- Pretty sparse wifi area. Single family homes, not many neighbors. I can see medium signals from 3 or 4 SSIDs from neighbors if I stand in exactly the right place. i.e., not much wifi interference.
- Network Map screen reports 54 clients, in AiMesh screen shows about 35 clients on primary node, 6 each on the subnodes, those numbers bounce around.
- Running 3.0.0.4.386_51665 firmware on all nodes.
- When it happens, the 68U subnode is not broadcasting. I see this by trying to attach wifi devices while near it, but they do not show any of the 4 SSIDs on my network (the location is too far away to see the other mesh nodes). I have tried rebooting just that node, it doesn't help. However, the 2.4GHz and 5 GHz lights on the front of the 68U subnode are on even when the client devices don't see the SSIDs.
- I can't access the Asus admin gui when this happens. Sometimes I can get to the login screen, but when I try to actually log in I just get a blank screen and eventual timeout. After trying once, I can't even get to the login screen again. I can get once to the https login screen and once to the http login screen, both with the same result (blank screen on trying to log in, then unable to access again).
- Setting the system's auto-reboot does not help. If the wifi is in this state and the system crosses its reboot time, it does not restore the system. In fact, when I looked at the log this week after having the issue twice, both times it looked like the issue actually happened during the automated reboot process. i.e., things were fine, the automated reboot happened, and then log went silent as the reboot was coming back up. At least both times the log stopped before the reboot process had completely finished. So I've turned off the auto reboot (but have had the problem again since turning it off).
- The system stops logging at some point when this happens. I have three examples of the problem in the system log, and in all three cases there are long periods between the last entry in the log and when the system gets rebooted. Once 22 hours, once 36 hours, once 51 hours. I do see other gaps in the logs, but nothing more than 9 hours in early morning hours on a day when the house was empty. Does this mean that the problem happens and it's a while before we notice it, or that something starts to happen and eventually affects the system enough so that we notice it? Dunno. Also note that two of those log gaps crossed periods during which the system should have rebooted itself. No indication in the logs that it actually tried, and certainly it didn't successfully restore the system if it tried.
- The CPU graphs show mostly under 10%, with infrequent brief spikes to 50% (but mostly much lower). Memory shows 29% used (74 MB used, 184 MB free). Although I've really only looked at these within hours of rebooting. But that makes me question the perhaps-otherwise-obvious next step of replacing the 10 year old piece of hardware in the middle of this all.
- It hasn't been a ton of time since my last factory reset. Don't remember when I did it, probably within the past 2 months, on all three devices.
- I do notice that the admin interface seems very sluggish. Takes 5-10 seconds to serve even simple screens. AiMesh screen over 30 seconds before its populated.
- Some maybes:
- This morning I had more time to debug when it happened. I noticed that some Wifi devices were still connected. I *think* that those devices were all connected to the primary node, but can't be positive.
- Also this morning, devices that were connected via hardware to the LAN ports of the 68U sub node were also still functional.
- I would have said before that neither of these things were true. But I can't say that definitively. But I'm not yet sure that both of the above are always true, and I think they may not be.
Anyway, any suggestions appreciated, thanks in advance!