[UPDATE - 20 Feb 26]
Hi,
Well, I have been doing what seems like endless trials with various settings (including Smart Connect, RA, In-Device RSSI etc), carrying out RSSI checks (with my own scripts tailored to Shelly devices) and just generally tweaking and whilst I am a long way from any firm conclusions or solutions, I just wanted to post an update on where I've landed. The number of reboots I have done, trying various sequence of reboots of the nodes, seems endless. Sometimes devices connect they appear (from the Client Lists and devices showing in my RSSI checks) to have done so, but have not actually connected.
There seems (I am no expert here!) to be persistent "
kernel: not mesh client, can't delete it" errors in the syslog (hundreds per day) correlating with Shelly IoT dropouts. Apparently (via Claude) this is an AiMesh "routing table" issue where the mesh client table gets corrupted. My (affected) Shelly devices "appear" connected but are unreachable via ping or WebGUI. RMerlin has noted in past threads that these messages are debugging noise but for my case, i.e. for my IoT devices, it seems errors just keep mounting up. According to RMerlin, the relevant code spitting out the "
not mesh client, can't delete it" messages is in closed-source ASUS binary blobs (love that terminology, just sounds like a wayward jellyfish). If I had the energy I would try to put stock on Main and Nodes and report it to ASUS but I am tired; and RMerlin allows you to easily issue commands and run scripts, which are difficult on stock routers.
What I ended up doing:
- Implementing a "Watchdog" script (wireless_corruption_watchdog.sh) running via cron every 5 mins on all nodes. Monitors for error bursts above an error threshold and triggers a wl0.2 bss down/up bounce + ARP flush to recover without a full reboot. Background noise is around 0-5 errors per 5-min window; real corruption is sustained errors well above 35, which then kicks in the wireless refresh script.
- Scheduled an ARP table flush every 2 hours as a preventative measure - separate from and complementary to the Watchdog.
- Set (tighter) Device-side RSSI thresholds on each Shelly to nudge roaming (and make them less dependent on Smart Connect Steering and Triggers). I appreciate some folks will say just disable SC! but I have my reasons to keep it.
- Sticky client behaviour remains an issue - some devices will latch onto a distant node/main at poor RSSI and never roam despite RA settings. Bind-then-unbind to get it to move to the correct node works as a manual fix for devices such as the Samsung Dryer. At least Shelly's have an in Device RSSI parameter and way to reboot them remotely; the Samsung ... nothing, despite the pretty Smarthings App.
- Binding - tried it, worked well for RSSI, but initially caused Shellys to disappear from the Shelly Cloud App, so I unbound everything and let them find their node (which is not perfect). For some reason when I tried binding again, this time the devices appeaerd in Shelly App OK, so it IS possible it use it, but I don't (use it).
Still unresolved:
The underlying AiMesh routing table corruption continues generating background errors. My little watchdog script mitigates it but doesn't fix it.
Thanks in any case to everyone who commented, even where the answer was essentially "no fix exists", (effectively) "give it up" or "Asus Marketing Rules OK" ...
k.
p.s. Happy to share the scripts btw in case anyone else wants to experiment, bit long to post here though.