What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

XT8 troubleshooting log, cfg_server possibly an issue

arrgh, thank you so much for looking into this! I had just posted a thread on this issue I've been facing and when I saw cfg_server in your thread's subject, I wasn't disappointed. Learned quite a bit! ASUS just had their work cut out for them.
 
dammit, it just started happening again!

Code:
admin@router:/tmp/home/root# netstat -tnap |grep cfg_server |wc -l
1006
admin@router:/tmp/home/root# ps wT |grep cfg_server |wc -l
510
 
OK so interestingly I'm running one of the XT8 nodes on that beta firmware above, and one on 386 firmware and am not getting the cfg_server memleak anymore. Maybe try mismatching firmware versions to see if it does that nonsense?
 
So Asus took my XT8's back to review and see if there's an issue, I have a new set now (still hw 1.0) and am going to try hooking one up as a node. Updates to follow!
 
WELP

23285 just started misbehaving again... 500+ threads, 800 connections in CLOSE_WAIT, cfg_server using lots of RAM

Code:
admin@router:/tmp/home/root# ps wT |grep cfg |wc -l
510
admin@router:/tmp/home/root# netstat -tnap |grep cfg |wc -l
806
admin@router:/tmp/home/root# netstat -tnap |grep cfg |grep -v CLOSE_WAIT |wc -l
1
admin@router:/tmp/home/root# ps w |grep cfg_server |grep -v grep
32087 admin     148m R    cfg_server

There are some segfaults in dmesg, including one for cfg_server:
Code:
potentially unexpected fatal signal 11.
CPU: 2 PID: 31775 Comm: cfg_server Tainted: P           O    4.1.52 #2
Hardware name: Generic DT based system
task: ccfb0c00 ti: c852a000 task.ti: c852a000
PC is at 0xb692855c
LR is at 0xb6ef4e9c
pc : [<b692855c>]    lr : [<b6ef4e9c>]    psr: 60010010
sp : acffe8c8  ip : b6f4c7cc  fp : acffef9c
r10: 000c5184  r9 : 000a0d93  r8 : b1a3ec82
r7 : 00000018  r6 : acfff920  r5 : 00000004  r4 : 00000000
r3 : 00000010  r2 : 00000001  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 10c5387d  Table: 11a4404a  DAC: 00000015
CPU: 2 PID: 31775 Comm: cfg_server Tainted: P           O    4.1.52 #2
Hardware name: Generic DT based system
[<c00270a0>] (unwind_backtrace) from [<c0022cf8>] (show_stack+0x10/0x14)
[<c0022cf8>] (show_stack) from [<c04c7bfc>] (dump_stack+0x8c/0xa0)
[<c04c7bfc>] (dump_stack) from [<c003ad2c>] (get_signal+0x490/0x558)
[<c003ad2c>] (get_signal) from [<c0022290>] (do_signal+0xc8/0x3ac)
[<c0022290>] (do_signal) from [<c0022718>] (do_work_pending+0x94/0xa4)
[<c0022718>] (do_work_pending) from [<c001f58c>] (work_pending+0xc/0x20)
 
Last edited:
I didn't restart or kill anything since my last post, and it seems to have recovered itself. Uptime 18 days. WELP^2.

I would guess cfg_server got restarted by something, but its proc/*/status indicates the current instance was once large:
Code:
admin@router:/tmp# grep VmPeak /proc/$(pidof cfg_server)/status
VmPeak:   152224 kB
 
Last edited:
Still on 23285 with wired backhaul, uptime 73 days, no noticeable UI slowdowns or cfg_server shenanigans. I'm calling this one good.
 
I loaded up (I believe) that firmware and poked around in it. Preferred the Merlin environment so reverted. Uptime on the router has got to be 60 days by now, so it's been at least that long ago.
Impertinent, but was notified of new thread content. Refreshing memory, saw that above. For reference that was gnuton's Merlin, the initial version (not 388). A fairly-regional power loss overcame my UPS at only a handful of days shy a year uptime.

The only issue that ever caught my attention was that local names quit being resolved once. And I poke around probably more than most - can be a "knob twiddler" at times. Evidently any running-configuration changes were benign-enough.
 
In the nearly-a-year since my last post here, I've upgraded a lot of my setup; I now have an AXE16000 as the main router, and have relegated two XT8s to satellite duty. It's been stable.
 
Still on 23285 with wired backhaul, uptime 73 days, no noticeable UI slowdowns or cfg_server shenanigans. I'm calling this one good.
Any idea which particular configurations seem to cause this?

I am currently diagnosing a similar issues with GT-BE98 where a cfg_server crash will cause all the nodes and take a while to recover. It does recover automatically.

Am already on Ethernet backhaul mode. Thanks!
 
I manage to solve this issue on my scenario and found that it's caused by a smart home device keep trying to connect to one of the mesh node every few seconds.

Since ASUSWRT enforced protected management frames on their latest WiFi 7 models, that particular client didn't support it and caused an endless reconnection loop. The wlceventd log is showing that the device manage to authenticate but it will disconnect almost immediately as it doesn't support PMF, leading to an endless spam every few seconds!

Well, here comes the interesting part. ASUSWRT will trigger an update from the mesh node to cfg_server at the main unit whenever there's a successful WiFi authentication at the node. However, when this is happening too frequently (endless spam once every 2-3 seconds), it seems that cfg_server is not designed to cope with it, leading to TCP CLOSE_WAIT sockets piling up.

Once it hits over 900+ sockets, cfg_server will either crash or the whole process will be killed automatically. When the cfg_server process is restarted, it will trigger a full cfgsync with all nodes at this point all WiFi clients at the mesh nodes will be dropped even if it's ethernet backhaul. This can be replicated by manually killing the cfg_server process.

While it's kinda my fault for trying to use an older smart home device, it seems like a poor feature implementation on ASUSWRT specifcally on cfg_server not robust enough to handle such scenario. I suspect this will affect their ExpertWiFi series as well.

Thanks to @arrgh for doing the initial investigation.
 

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top