What's new

IoT Devices falling off nodes (only) always requires reboot to fix

Some of your settings even before you get to AiMesh issues don't make much sense to me. You have WPA2/3 transitional, WPA3 requires PMF, Wi-Fi 6 is enabled on 2.4GHz band most likely in B/G/N/AX Auto, IGMP snooping is enabled, TX bursting is disabled... and you have a bunch of IoT devices which support none of the above and such settings can only make it worse. On 5GHz you push 160MHz wide Auto channel for unknown to me reason, it may only generate higher speed test numbers in expense of lower reliability and range. 🤷‍♂️
 
Some of your settings even before you get to AiMesh issues don't make much sense to me. You have WPA2/3 transitional, WPA3 requires PMF, Wi-Fi 6 is enabled on 2.4GHz band most likely in B/G/N/AX Auto, IGMP snooping is enabled, TX bursting is disabled... and you have a bunch of IoT devices which support none of the above and such settings can only make it worse. On 5GHz you push 160MHz wide Auto channel for unknown to me reason, it may only generate higher speed test numbers in expense of lower reliability and range. 🤷‍♂️
Re 2.4Ghz I’m pretty sure I mentioned somewhere above that as you can’t make the settings SSID specific, and I wanted some older iOS devices or iOS devices that will work a distance away and fall to the 2.4Ghz network to also connect, at up to AX. Is this realistic with a 20Mhz wide channel? I really don’t know. But I’d like to retain as many settings as possible for those devices as well as the IoTs. Asking too much? Maybe. I read about IGMP snooping and if I need to tweak some more I can look at it.

Actually the IoT network SSID in GNP uses WPA2 so for that there is a per SSID setting and it works well. My original aim was to remain as close to defaults as possible and tweak only what I need to, to get connected at the strongest node and stay connected. The trial and error parameters I list above work for me, thus far.

My 5Ghz devices connect really well, the speeds are good at 160 and the Smart Connect thresholds mean they find the strongest node and are not sticky if they move. I’m very happy with 5Ghz and as you often state, the client devices decide much of the connectivity and mine (iPhones iPads mostly) work really well in the main, they seem to be much cleverer than Shelly’s at any rate… I don’t have issues at 5Ghz. I know you and others say drop it to 80, but TBH, touch wood, 160 is working well.
 
[UPDATE - 20 Feb 26]

Hi,

Well, I have been doing what seems like endless trials with various settings (including Smart Connect, RA, In-Device RSSI etc), carrying out RSSI checks (with my own scripts tailored to Shelly devices) and just generally tweaking and whilst I am a long way from any firm conclusions or solutions, I just wanted to post an update on where I've landed. The number of reboots I have done, trying various sequence of reboots of the nodes, seems endless. Sometimes devices connect they appear (from the Client Lists and devices showing in my RSSI checks) to have done so, but have not actually connected.

There seems (I am no expert here!) to be persistent "kernel: not mesh client, can't delete it" errors in the syslog (hundreds per day) correlating with Shelly IoT dropouts. Apparently (via Claude) this is an AiMesh "routing table" issue where the mesh client table gets corrupted. My (affected) Shelly devices "appear" connected but are unreachable via ping or WebGUI. RMerlin has noted in past threads that these messages are debugging noise but for my case, i.e. for my IoT devices, it seems errors just keep mounting up. According to RMerlin, the relevant code spitting out the "not mesh client, can't delete it" messages is in closed-source ASUS binary blobs (love that terminology, just sounds like a wayward jellyfish). If I had the energy I would try to put stock on Main and Nodes and report it to ASUS but I am tired; and RMerlin allows you to easily issue commands and run scripts, which are difficult on stock routers.

What I ended up doing:
  1. Implementing a "Watchdog" script (wireless_corruption_watchdog.sh) running via cron every 5 mins on all nodes. Monitors for error bursts above an error threshold and triggers a wl0.2 bss down/up bounce + ARP flush to recover without a full reboot. Background noise is around 0-5 errors per 5-min window; real corruption is sustained errors well above 35, which then kicks in the wireless refresh script.
  2. Scheduled an ARP table flush every 2 hours as a preventative measure - separate from and complementary to the Watchdog.
  3. Set (tighter) Device-side RSSI thresholds on each Shelly to nudge roaming (and make them less dependent on Smart Connect Steering and Triggers). I appreciate some folks will say just disable SC! but I have my reasons to keep it.
  4. Sticky client behaviour remains an issue - some devices will latch onto a distant node/main at poor RSSI and never roam despite RA settings. Bind-then-unbind to get it to move to the correct node works as a manual fix for devices such as the Samsung Dryer. At least Shelly's have an in Device RSSI parameter and way to reboot them remotely; the Samsung ... nothing, despite the pretty Smarthings App.
  5. Binding - tried it, worked well for RSSI, but initially caused Shellys to disappear from the Shelly Cloud App, so I unbound everything and let them find their node (which is not perfect). For some reason when I tried binding again, this time the devices appeaerd in Shelly App OK, so it IS possible it use it, but I don't (use it).
Still unresolved:

The underlying AiMesh routing table corruption continues generating background errors. My little watchdog script mitigates it but doesn't fix it.

Thanks in any case to everyone who commented, even where the answer was essentially "no fix exists", (effectively) "give it up" or "Asus Marketing Rules OK" ... :)

k.

p.s. Happy to share the scripts btw in case anyone else wants to experiment, bit long to post here though.
 
You may want to publish "Adventures to AiMesh Land" stories. Hollywood may pick it up and make an action movie starring The Rock.
 
You may want to publish "Adventures to AiMesh Land" stories. Hollywood may pick it up and make an action movie starring The Rock.
🤣

Yep.

From the scripts, the hourly bursts I am seeing are around 40-50 errors and the ARP flush alone is sufficient to handle them without any device disruption.
No service restart_wireless required....

But really... it.should.not.be.this.hard.
 
This thread is quite the read. I'm curious, is it only the Shelly devices where you're having a problem? The bulk of my IoT devices are Zigbee and Matter-over-Thread but I do have some WiFi cameras, some smart plugs, and some light switches that are WiFi and I haven't seen any of this kind of behavior on my setup.
 
No service restart_wireless required...

What's the fun in owning ASUS devices and have no need to restart_something from time to time? 🤣
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Staff online

Back
Top