Wireless Corruption Watchdog & ARP Flush — Summary
Problem: ASUS AiMesh firmware (appears to) corrupt its internal mesh client routing tables, causing IoT devices (primarily Shellies) to appear connected but become unreachable. The corruption manifests as "not mesh client, can't delete it" kernel errors in syslog. Root cause is in closed-source ASUS binary blobs — no firmware fix exists or is planned.
Observed pattern: Errors run at 0–5 per 5-minute window as background noise. ASUS firmware also produces a predictable spike of 40–70 errors in a single second at approximately :05 past every hour — a mesh table reconciliation burst that is benign but unavoidable.
wireless_corruption_watchdog.sh (runs every 5 min on all nodes via cron): counts genuine "not mesh client" errors within a true 5-minute sliding window using pure awk arithmetic (no subprocess overhead). When errors exceed 35 in a window it applies two-tier recovery:
- Tier 1 — silent ARP cache flush only. Fires on threshold breach. 30-minute cooldown. Devices unaffected.
- Tier 2 — wl0.2 interface bounce forcing all 2.4GHz IoT devices to reconnect. Only fires if errors remain continuously elevated after a Tier 1 (persistence flag never cleared), proving ARP flush alone was insufficient. 12-hour cooldown. Deliberately rare due to BLU TRV Gateway LED flash-and-reconnect on all 6 units.
wireless_refresh.sh — called by the watchdog with a tier1 or tier2 argument, executing the appropriate recovery action.
scheduled_arp_flush.sh — independent preventative ARP flush running every 2 hours on the main router regardless of error counts.