What's new

Wireless network (3 AiMesh nodes, AP mode) drops out every few days, requires hard reboot

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

CharlieRutledge

New Around Here
Looking for ideas about where to go next figuring out what's happening. I don't mind buying new hardware, but prefer to have some indication that this is really the issue!

My configuration:
  • The primary node is an RT-AC68U running in AP mode. It sits behind a NetGear switch which sits behind a firewall/router running OpnSense behind an xfinity cable modem with 1G service down / 40 M up.
  • I have three AiMesh nodes (including the primary), with sub nodes each wired backhaul to a LAN port on the primary node. Sub nodes are an RT-AC68U and an RT-AC66U B1. The sub 68U also has a few devices hardwired to its LAN ports.
  • All the wired stuff is 1G over cat 5e or cat 6. AiMesh admin screen confirms 1G wired backhaul.
  • 4 SSIDs, two on 2.4GHz and two on 5 GHz (one each internal and guest).
  • Pretty sparse wifi area. Single family homes, not many neighbors. I can see medium signals from 3 or 4 SSIDs from neighbors if I stand in exactly the right place. i.e., not much wifi interference.
  • Network Map screen reports 54 clients, in AiMesh screen shows about 35 clients on primary node, 6 each on the subnodes, those numbers bounce around.
  • Running 3.0.0.4.386_51665 firmware on all nodes.
The issue is that every now and then the wifi network stops working correctly (defined more below). Since we normally notice in the morning and my wife works from home, I typically don't have time to try debugging, I just have to get it working quickly. I do that by unplugging all three ASUS nodes, plugging in the main node, waiting a few minutes, plugging in the 66U, waiting a few more minutes for it to establish as a node, and then plugging in the 68U node. After I do that, everything comes back for a while. Sometimes for weeks, sometimes just for a couple days. What I do know:
  • When it happens, the 68U subnode is not broadcasting. I see this by trying to attach wifi devices while near it, but they do not show any of the 4 SSIDs on my network (the location is too far away to see the other mesh nodes). I have tried rebooting just that node, it doesn't help. However, the 2.4GHz and 5 GHz lights on the front of the 68U subnode are on even when the client devices don't see the SSIDs.
  • I can't access the Asus admin gui when this happens. Sometimes I can get to the login screen, but when I try to actually log in I just get a blank screen and eventual timeout. After trying once, I can't even get to the login screen again. I can get once to the https login screen and once to the http login screen, both with the same result (blank screen on trying to log in, then unable to access again).
  • Setting the system's auto-reboot does not help. If the wifi is in this state and the system crosses its reboot time, it does not restore the system. In fact, when I looked at the log this week after having the issue twice, both times it looked like the issue actually happened during the automated reboot process. i.e., things were fine, the automated reboot happened, and then log went silent as the reboot was coming back up. At least both times the log stopped before the reboot process had completely finished. So I've turned off the auto reboot (but have had the problem again since turning it off).
  • The system stops logging at some point when this happens. I have three examples of the problem in the system log, and in all three cases there are long periods between the last entry in the log and when the system gets rebooted. Once 22 hours, once 36 hours, once 51 hours. I do see other gaps in the logs, but nothing more than 9 hours in early morning hours on a day when the house was empty. Does this mean that the problem happens and it's a while before we notice it, or that something starts to happen and eventually affects the system enough so that we notice it? Dunno. Also note that two of those log gaps crossed periods during which the system should have rebooted itself. No indication in the logs that it actually tried, and certainly it didn't successfully restore the system if it tried.
  • The CPU graphs show mostly under 10%, with infrequent brief spikes to 50% (but mostly much lower). Memory shows 29% used (74 MB used, 184 MB free). Although I've really only looked at these within hours of rebooting. But that makes me question the perhaps-otherwise-obvious next step of replacing the 10 year old piece of hardware in the middle of this all.
  • It hasn't been a ton of time since my last factory reset. Don't remember when I did it, probably within the past 2 months, on all three devices.
  • I do notice that the admin interface seems very sluggish. Takes 5-10 seconds to serve even simple screens. AiMesh screen over 30 seconds before its populated.
  • Some maybes:
    • This morning I had more time to debug when it happened. I noticed that some Wifi devices were still connected. I *think* that those devices were all connected to the primary node, but can't be positive.
    • Also this morning, devices that were connected via hardware to the LAN ports of the 68U sub node were also still functional.
    • I would have said before that neither of these things were true. But I can't say that definitively. But I'm not yet sure that both of the above are always true, and I think they may not be.
Sorry for the wall of text, if you got this far much appreciated! Trying to figure out where to go next on this, and don't see anything obvious. The routers are workhorses and I've had the primary node for almost 10 years. So maybe time to replace it, which at this point is probably my next step. Just wish I had more definitive evidence that it was the culprit. Since if it doesn't work all the obvious next steps are to continue to replace things. Which again I don't mind doing but prefer better evidence that it will fix the problem.

Anyway, any suggestions appreciated, thanks in advance!
 
Your terms are confusing.

If you're running in AP mode, you are not running anything in AiMesh, I believe. Please clarify how the network is set up. Do you need Guest Network 1 propagated throughout the home?

The RT-AC66U_B1 is a better candidate for use as the 'main' AP (behind your OpenSense main router). It easily supersedes the RT-AC68U (Orig). This is what I would recommend you try (before you buy anything new). Do not import a saved backup config file to set up the new network. Perform a full reset and a minimal and manual configuration to connect to your ISP and secure the router.

If you're going to buy new equipment (if needed), then the GT-AX6000 or the RT-AX88U Pro are good choices (best bang for the buck) today. Note that you will need fewer (if any) 'nodes' with these new AX class models. Depending on your SqFt and home construction materials used. If you do decide to buy new hardware, proceed with a single router and only open/add additional units as necessary.
 
Definitely possible I got terms wrong, apologies for any confusion.

On the Network Status page it says "Parent AP status : AP Mode". The WAN port on the primary 68U is connected to a NetGear switch, which is also connected to an OpnSense firewall/router for Internet access (which is why I am running the Asus stuff in AP mode).

The WAN port on each of the "subnodes" (the 66U and the second 68U) is wired directly (ie, not via the switch) to a LAN port of the primary 68U, and I set them up (painfully) using the "Add AiMesh Node" on the AiMesh page. That page now shows two "subnodes" each connected up to the primary node via 1G ethernet backhaul.

So I am pretty sure I am running both in AP mode and running AiMesh, but perhaps misunderstanding something.

I do need to propagate the Guest network through the home, and this setup does that. I get a strong signal on those SSIDs everywhere.

Good idea on swapping the 66U in as the primary. I had seen the specs on the pinned post in this forum, but neither of my 68Us show what hardware version they are in any way I can figure out, so I thought there was a possibility that the 66U wasn't any beefier. Although more likely it means they are gen 1 (esp given their age). I'll try that and see if this stops happening. I've always found it to be a bit of a dance rebuilding the AiMesh network, so not looking forward to it!

Thanks on the recommendations for replacement. I was hoping (although not necessarily hopeful) that a replacement might allow me to go down to two devices or even one.

Appreciate you taking the time to weigh in, thank you.
 
I'm not sure what device you're looking at the Network Status page on.

It seems like the RT-AC68U also has the ability to be configured in "Access Point(AP) mode / AiMesh Router in AP mode" (I learned something new here). Can you confirm this (I haven't seen an RT-AC68U in years, used as a main router)? This would be On the Operation Mode tab in Administration section on that main 'node'.

Does stock firmware allow you to see the CPU frequency? If the main RT-AC68U is 800MHz, then switching to the RT-AC66U_B1 should be an upgrade.

I forgot to give you the reset link in my post above.



Use the appropriate method for each of your routers, if you do reconfigure the system. And maybe you will be able to get AiMesh nodes to work easier, if you do.

How to Connect an AiMesh Node




Keep us posted.
 
I am looking at it on the "primary" 68U. The other two devices do not allow me to log in to their admin pages- when I try to access them, they auto-redirect to the IP address of the primary unit.

You are correct on the config section. The selected option on the Operation Mode tab is "Access Point(AP) mode / AiMesh Router in AP mode".

I couldn't find anything in the admin GUI that gave me any processor info, and looking in the system log didn't see anything obvious (to me anyway). Thought at boot time it might identify in some way. This line in the system log:
  • May 5 01:05:03 kernel: CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c53c7f
might identify it, but not to me. But given that I bought those 68Us in 2014 I think a reasonable bet that the 66U I bought 3 years ago uses a newer processor, although not a sure one. But I'd rather spend a half day reconfiguring a network than spend a few hundred dollars I don't need to spend :)

Thanks for the links on the reset and the aimesh node config. Will try the reconfig and see how it goes!
 
Your guest network in AP mode is perhaps only an additional SSID. Check if it is actually separated from your main network.
 
Yes you are correct the Guest network at this point is just a way to get onto my WiFi network with a much simpler password. When I moved the router & firewall functions out of the Asus over to the OpnSense box that was a side effect. Every now and then I think that I might look at doing something about it, but it never makes its way to the top of the list.
 
Proper APs with VLAN support and PoE is what I would do. Wi-Fi 6 models start from about $100 for AX1800 class. You don’t need home AIO routers for APs to pfSense router.
 
Agreed. That would let me do a proper guest network, and more importantly segment off all of the iot devices I have littered my network with over the past decade. And a lot more fun to play with. My wife works from home and requires a reliable connection, so there is some peril here. Thanks for the AX1800 pointer.
 
Since you need to do reset and reconfigure again, swap the primary node ac68u with the ac66u b1. Ac66u b1 has a faster processor, 1.0 or 1.4 ghz, while 2014 ac68u only 0.8ghz, the hardware revision is A1.
 

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top