What's new

IoT Devices falling off nodes (only) always requires reboot to fix

Some of your settings even before you get to AiMesh issues don't make much sense to me. You have WPA2/3 transitional, WPA3 requires PMF, Wi-Fi 6 is enabled on 2.4GHz band most likely in B/G/N/AX Auto, IGMP snooping is enabled, TX bursting is disabled... and you have a bunch of IoT devices which support none of the above and such settings can only make it worse. On 5GHz you push 160MHz wide Auto channel for unknown to me reason, it may only generate higher speed test numbers in expense of lower reliability and range. 🤷‍♂️
 
Some of your settings even before you get to AiMesh issues don't make much sense to me. You have WPA2/3 transitional, WPA3 requires PMF, Wi-Fi 6 is enabled on 2.4GHz band most likely in B/G/N/AX Auto, IGMP snooping is enabled, TX bursting is disabled... and you have a bunch of IoT devices which support none of the above and such settings can only make it worse. On 5GHz you push 160MHz wide Auto channel for unknown to me reason, it may only generate higher speed test numbers in expense of lower reliability and range. 🤷‍♂️
Re 2.4Ghz I’m pretty sure I mentioned somewhere above that as you can’t make the settings SSID specific, and I wanted some older iOS devices or iOS devices that will work a distance away and fall to the 2.4Ghz network to also connect, at up to AX. Is this realistic with a 20Mhz wide channel? I really don’t know. But I’d like to retain as many settings as possible for those devices as well as the IoTs. Asking too much? Maybe. I read about IGMP snooping and if I need to tweak some more I can look at it.

Actually the IoT network SSID in GNP uses WPA2 so for that there is a per SSID setting and it works well. My original aim was to remain as close to defaults as possible and tweak only what I need to, to get connected at the strongest node and stay connected. The trial and error parameters I list above work for me, thus far.

My 5Ghz devices connect really well, the speeds are good at 160 and the Smart Connect thresholds mean they find the strongest node and are not sticky if they move. I’m very happy with 5Ghz and as you often state, the client devices decide much of the connectivity and mine (iPhones iPads mostly) work really well in the main, they seem to be much cleverer than Shelly’s at any rate… I don’t have issues at 5Ghz. I know you and others say drop it to 80, but TBH, touch wood, 160 is working well.
 
[UPDATE - 20 Feb 26]

Hi,

Well, I have been doing what seems like endless trials with various settings (including Smart Connect, RA, In-Device RSSI etc), carrying out RSSI checks (with my own scripts tailored to Shelly devices) and just generally tweaking and whilst I am a long way from any firm conclusions or solutions, I just wanted to post an update on where I've landed. The number of reboots I have done, trying various sequence of reboots of the nodes, seems endless. Sometimes devices connect they appear (from the Client Lists and devices showing in my RSSI checks) to have done so, but have not actually connected.

There seems (I am no expert here!) to be persistent "kernel: not mesh client, can't delete it" errors in the syslog (hundreds per day) correlating with Shelly IoT dropouts. Apparently (via Claude) this is an AiMesh "routing table" issue where the mesh client table gets corrupted. My (affected) Shelly devices "appear" connected but are unreachable via ping or WebGUI. RMerlin has noted in past threads that these messages are debugging noise but for my case, i.e. for my IoT devices, it seems errors just keep mounting up. According to RMerlin, the relevant code spitting out the "not mesh client, can't delete it" messages is in closed-source ASUS binary blobs (love that terminology, just sounds like a wayward jellyfish). If I had the energy I would try to put stock on Main and Nodes and report it to ASUS but I am tired; and RMerlin allows you to easily issue commands and run scripts, which are difficult on stock routers.

What I ended up doing:
  1. Implementing a "Watchdog" script (wireless_corruption_watchdog.sh) running via cron every 5 mins on all nodes. Monitors for error bursts above an error threshold and triggers a wl0.2 bss down/up bounce + ARP flush to recover without a full reboot. Background noise is around 0-5 errors per 5-min window; real corruption is sustained errors well above 35, which then kicks in the wireless refresh script.
  2. Scheduled an ARP table flush every 2 hours as a preventative measure - separate from and complementary to the Watchdog.
  3. Set (tighter) Device-side RSSI thresholds on each Shelly to nudge roaming (and make them less dependent on Smart Connect Steering and Triggers). I appreciate some folks will say just disable SC! but I have my reasons to keep it.
  4. Sticky client behaviour remains an issue - some devices will latch onto a distant node/main at poor RSSI and never roam despite RA settings. Bind-then-unbind to get it to move to the correct node works as a manual fix for devices such as the Samsung Dryer. At least Shelly's have an in Device RSSI parameter and way to reboot them remotely; the Samsung ... nothing, despite the pretty Smarthings App.
  5. Binding - tried it, worked well for RSSI, but initially caused Shellys to disappear from the Shelly Cloud App, so I unbound everything and let them find their node (which is not perfect). For some reason when I tried binding again, this time the devices appeaerd in Shelly App OK, so it IS possible it use it, but I don't (use it).
Still unresolved:

The underlying AiMesh routing table corruption continues generating background errors. My little watchdog script mitigates it but doesn't fix it.

Thanks in any case to everyone who commented, even where the answer was essentially "no fix exists", (effectively) "give it up" or "Asus Marketing Rules OK" ... :)

k.

p.s. Happy to share the scripts btw in case anyone else wants to experiment, bit long to post here though.
 
You may want to publish "Adventures to AiMesh Land" stories. Hollywood may pick it up and make an action movie starring The Rock.
 
You may want to publish "Adventures to AiMesh Land" stories. Hollywood may pick it up and make an action movie starring The Rock.
🤣

Yep.

From the scripts, the hourly bursts I am seeing are around 40-50 errors and the ARP flush alone is sufficient to handle them without any device disruption.
No service restart_wireless required....

But really... it.should.not.be.this.hard.
 
This thread is quite the read. I'm curious, is it only the Shelly devices where you're having a problem? The bulk of my IoT devices are Zigbee and Matter-over-Thread but I do have some WiFi cameras, some smart plugs, and some light switches that are WiFi and I haven't seen any of this kind of behavior on my setup.
 
No service restart_wireless required...

What's the fun in owning ASUS devices and have no need to restart_something from time to time? 🤣
 
This thread is quite the read. I'm curious, is it only the Shelly devices where you're having a problem? The bulk of my IoT devices are Zigbee and Matter-over-Thread but I do have some WiFi cameras, some smart plugs, and some light switches that are WiFi and I haven't seen any of this kind of behavior on my setup.
Not really, it's less about the Shelly's (although they form the bulk of my IoT devices) and more about the compounding errors (there's also something that fires 5s past the hr in ASUS FW apparently to reconcile errors), errors which compund to a point where the IoT devices start dropping off, I just see it more with the Shellys as there is a Cloud App (iOS and PC) that I monitor to see if they are online.

I will admit I am using Claude to vibe code the watchdog script, here is an extract from today's check of how it is working.
Maybe it is just MY system, I really do not know. I have wired nodes exclusively, so AiMesh should work.

  • True 5-minute counts throughout — 0-5 background noise exactly as predicted
  • Hourly bursts at :05 past every hour — 47, 46, 51, 55, 57, 63, 66 errors — perfectly consistent with the known firmware reconciliation pattern
  • Tier 1 (ARP flush) firing on each burst, errors dropping immediately after — the flush is working
  • Background noise returning to 0-3 within one or two cycles after each burst

Background

Wireless Corruption Watchdog & ARP Flush — Summary

Problem:
ASUS AiMesh firmware (appears to) corrupt its internal mesh client routing tables, causing IoT devices (primarily Shellies) to appear connected but become unreachable. The corruption manifests as "not mesh client, can't delete it" kernel errors in syslog. Root cause is in closed-source ASUS binary blobs — no firmware fix exists or is planned.

Observed pattern: Errors run at 0–5 per 5-minute window as background noise. ASUS firmware also produces a predictable spike of 40–70 errors in a single second at approximately :05 past every hour — a mesh table reconciliation burst that is benign but unavoidable.

wireless_corruption_watchdog.sh (runs every 5 min on all nodes via cron): counts genuine "not mesh client" errors within a true 5-minute sliding window using pure awk arithmetic (no subprocess overhead). When errors exceed 35 in a window it applies two-tier recovery:
  • Tier 1 — silent ARP cache flush only. Fires on threshold breach. 30-minute cooldown. Devices unaffected.
  • Tier 2 — wl0.2 interface bounce forcing all 2.4GHz IoT devices to reconnect. Only fires if errors remain continuously elevated after a Tier 1 (persistence flag never cleared), proving ARP flush alone was insufficient. 12-hour cooldown. Deliberately rare due to BLU TRV Gateway LED flash-and-reconnect on all 6 units.
wireless_refresh.sh — called by the watchdog with a tier1 or tier2 argument, executing the appropriate recovery action.
scheduled_arp_flush.sh — independent preventative ARP flush running every 2 hours on the main router regardless of error counts.
 
Hurry up with Episode I because @aex.perez is almost done with Episode II. 😬

Teaser: In the mean time the "other manufacturer" pushed an update with new Roaming Visual feature in addition to Channel AI. The admin can see now how the clients move between the APs. It's quite nice.

1771630478168.png
1771630259669.png
 
Hurry up with Episode I because @aex.perez is almost done with Episode II. 😬

Teaser: In the mean time the "other manufacturer" pushed an update with new Roaming Visual feature in addition to Channel AI. The admin can see now how the clients move between the APs. It's quite nice.

View attachment 70428View attachment 70427
"AI" is quickly becoming like "Marklar" from that South Park episode, pretty much all marketing will eventually be "AI is AI to the AI because AI brings AI to the AI."
 
"AI" is quickly becoming like "Marklar" from that South Park episode, pretty much all marketing will eventually be "AI is AI to the AI because AI brings AI to the AI."
AI_Hmm.jpg
 
Hurry up with Episode I because @aex.perez is almost done with Episode II. 😬

Teaser: In the mean time the "other manufacturer" pushed an update with new Roaming Visual feature in addition to Channel AI. The admin can see now how the clients move between the APs. It's quite nice.

View attachment 70428View attachment 70427
Does the "other manufacturer" do Guest Networks on Nodes and not just Main SSIDs on APs?
 
Does the "other manufacturer" do Guest Networks on Nodes and not just Main SSIDs on APs?
Maybe it is about time for you to go back to stock Asus firmware! I am not having any issues with the current Asus beta on my AX86U Pro. Also feel your fooling with settings messes things up.
Just my $0.02.
 
Maybe it is about time for you to go back to stock Asus firmware! I am not having any issues with the current Asus beta on my AX86U Pro. Also feel your fooling with settings messes things up.
Just my $0.02.
Nah .... I need a few scripts and things on the Main at least. I had all stock nodes btw, then changed to Merlin as stock was not performing anyway; and Merlin is much easier to run scripts and commands just to check things like RSSI etc. i.e. monitor and troubleshoot.

Love your optimisim, I really do, but I am now at the point where I am hoping the beta FW makes it way to GPLs. I am not convinced that Stock would make it all work just beautifully and the flexibility of Merlin is just too hard to move away from.
 
Does the "other manufacturer" do Guest Networks on Nodes and not just Main SSIDs on APs?

Very limited...

In my case with dual-band APs I can have 8x SSID-to-VLAN per radio or in real life 1x 2.4/5GHz main network and up to 14x other networks. In theory with 4x APs I'm limited to 64x SSID-to-VLAN combinations and therefore planning upgrade to tri-band APs which will allow up to 96x combinations. Then I plan to bump the number of APs to 10x because the Gateway can create ~250 VLANs and I feel they are underutilized. 🤣
 
Very limited...

In my case with dual-band APs I can have 8x SSID-to-VLAN per radio or in real life 1x 2.4/5GHz main network and up to 14x other networks. In theory with 4x APs I'm limited to 64x SSID-to-VLAN combinations and therefore planning upgrade to tri-band APs which will allow up to 96x combinations. Then I plan to bump the number of APs to 10x because the Gateway can create ~250 VLANs and I feel they are underutilized. 🤣
If it had AI you'd probably be able to do a couple thousand separate networks and it would be able to generate funny pictures of quokkas.
 
We've been AI-ed, no escape. I'm carefully going around all AI things in the UI, but here is one AI Threat Assessment thing I can't control. I have no idea what is it doing. It stays quiet for now though, good boy. Some AI touched ISP line monitoring on one of the previous updates, it got ugly. Had to use manual rules for some time and strangely for 2 out of 4 ISPs. The other 2 were showing fine for unknown reason. It got fixed with the last update, so... back in AI business. 😬
 
Maybe it is about time for you to go back to stock Asus firmware! I am not having any issues with the current Asus beta on my AX86U Pro. Also feel your fooling with settings messes things up.
Just my $0.02.
I'd be the first to admit the level to which I have 'fooled around' with settings might indeed be 'messing things up' and yes, I wish it would just work out of the box, I really do. But it doesn't, not for even the largely VLAN capable GNP capable components of my system, with wired backhaul, with my device numbers and types.

However if you're still running the relatively recent Stock Beta (which is only available to Asus Stock Users and not Merlin Users) it is not really a fair comparison at this point in time, in my view. The enhancements are heartening and apply to both my GT-AX6000 and RT-AX86U Pro Nodes in my remote system; and to the RT-AX88U Pro in my local system, which works very well on Merlin, no issues with one Node (RT-AX3000) and a handful of IoT devices (4 x Alexa echo dot 3s, 4 x Sensibo Skys, 5 x ESP32s, 2 x TVs). Go figure.

I was hoping Merlin would get those beta GPLs soon. If folks believe the Merlin philosophy of enhancing the stock FW i.e. Stay as close as possible to the original firmware, it should be occuring in both FW, as the components of the Merlin FW that appear to be falling down for me are I believe in the "closed source" components (AiMesh e.g.) provided by ASUS and included in the GPLs provided to Merlin, so I am struggling to understand why stock would (or should) be better? If someone can say hand on heart, they know that Merlin has tweaked some element of the stock FW that has had the effect I am seeing in my system, I will revert everything to stock when I get to my remote site in 6 months time.

Merlin has emphasised the point that Wifi drivers and AiMesh are outside his control on many occasions, so in theory, anything that relies on or operates with AiMesh (e.g.) should have the same stability or the same issues in either firmware, should they not? There are lots of discussions about this subject, just a quick check e.g. here is one from 2023.

Just a couple of queries though please, so I can see how our systems compare:
  1. How many GNP VLANs are you running on your system that you propogate to your node (I think that is a singular node)? For reference I have 4; see sig, remote system.
  2. How many IoT devices do you have connected (on 2.4GHz)? Asking as there are a few threads where it seems the AiMesh seems to struggle with a lot of IoT devices (as I have) and my testing shows they mount up, then the AiMesh Router or node spits the dummy and the devices appear connected but are unreachable via WebGUI or Ping. On my remote system I have 31 Wireless (28 Shellys) devices plus 9 Wired on the IoT VLAN 63; plus 2 IoT devices (TVs) on the Guest VLAN 62. So 33 wired IoT devices, before anyone on Guest Network or Main connects at mainly 5 Ghz. Not huge, but maybe high enough that the system cannot flush the errors out fast enough.
  3. What about kernel "not mesh client, can't delete it" messages? Do you get lots of these too? My current log has 4777 of 19592 of these (24%), which seems a lot. I believe these have often been put down as "noise" that can be ignored, but on my system if they rise a lot, it seems to destabilise the system and the devices fall off it, even with good RSSI. I'd be interested if the numbers of errors appear in stock, but that stock handles them better; or whether systems with fewer numbers of devices simply manage to flush the ARP cache sufficiently (or whatever mechanisms they use to regularly restabilise the system) that errors do not accumulate to a point they pose a problem.
Thanks.
 
Last edited:

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top