What's new

Defunct cfg_server Zombies

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

I noticed that every time the zombie slayer script ran, I wound up with 0 cfg_server processes running, instead of 3 or so.

So, I thought “could it be that it’s killing the wrong process first‽” and fixed up your edits to make sure not to kill that one. But still I was winding up with zero. I decided to add a break to only kill one process. Again zero. Grabbed a random one from the list and ran kill -15 pid. all dead.

I have once again run up against the edges of my cleverness. Here’s the state of the script, which I’m quite happy with overall and would use if I didn’t have that second node. (note: this happens whether or not one has multiple nodes, to say nothing of AIMesh)

TL;DR runaway cfg_server issues will happen regardless of AIMesh usage; killing one of the 1012 processes kills them all, killing connection to nodes; i have no idea how to deal with root cause

(re root cause: As @SomeWhereOverTheRainBow points out: lockfile? process never returning? idk)

Code:
#!/bin/sh
# This script relies on parsing strings, don't include `cfg_server` in its name

awk_pidof() { ps wT | awk '/\scfg_server/{if( $0 !~ /awk/)printf "%s ", $1}'; }

if [ "$(ps wT | awk -v var="$(basename "$0")" '{if( $0 !~ /awk/ && $0 ~ /$var/ )printf "%s ", $1}' | wc -w)" -gt 1 ]; then
  echo "Exiting zombie slayer because it thinks it's a duplicate process"
  exit;
fi

while :; do
   while [ "$(awk_pidof | wc -w)" -le 3 ]; do sleep 5; done;

   /usr/bin/logger "Zombie slayer detects excess cfg_server processes"
   # echo "Zombie slayer detects excess cfg_server processes"

   i=0
   parent_pid=$(pstree -s cfg_server | awk '{if($0 !~ /init/)printf "%i", $2}')

   for pid in $(awk_pidof); do
      if [ "$pid" != "$parent_pid" ]; then
         # echo "killing pid $pid"
         /bin/kill -s 9 $pid &
         # /usr/bin/logger "Zombie slayer killing pid $pid"
         i=$((i + 1))
         break
      fi
   done

   /usr/bin/logger "Zombie slayer back to sleep after killing $i processes"
   # echo "Zombie slayer back to sleep after killing $i processes"
done

inevitably someone is going to say “why don’t you just kill all the processes always and forever, and switch how your nodes work?" the short answer is, i’m a software person not a hardware person and i dread having to think about channels and juggling multiple SSIDs and whatnot. I only need the second node for a tiny corner of my home, but I *need* that node for that tiny corner of my home.
What happens if you send

kill -s 17 to the parent process only? I wonder if that kills both the parent process and child processes.
 
What happens if you send

kill -s 17 to the parent process only? I wonder if that kills both the parent process and child processes.
of course! child signal!

but ugh, it did nothing. router continued to have approx 1012 cfg_server processes running. I even tried running a loop and sending it to every process in case there was a cascade I could kick off somewhere. No dice.
 
of course! child signal!

but ugh, it did nothing. router continued to have approx 1012 cfg_server processes running. I even tried running a loop and sending it to every process in case there was a cascade I could kick off somewhere. No dice.
So, what is your network topology. I mean what are you using as node(s), and what firmware version are you using on your node(s) and your main router?
 
2 x ZenWiFi_XT8 (base model: RT-AX95Q); Firmware Version 388.1_0-gnuton1 with an ethernet backhaul. Main is set up as "Wireless router mode / AiMesh Router mode”, second set up as an AiMesh node.
 
I am playing with the script right now.

Testing it with my own self made defunct lol.

Code:
#!/bin/sh
# This script relies on parsing strings, don't include `cfg_server` in its name

awk_pidof() { ps wT | awk '/\scfg_server/{if( $0 !~ /awk/)printf "%s ", $1}'; }

if [ "$(ps wT | awk -v var="$(basename "$0")" '{if( $0 !~ /awk/ && $0 ~ /$var/ )printf "%s ", $1}' | wc -w)" -gt 1 ]; then
  echo "Exiting zombie slayer because it thinks it's a duplicate process"
  exit;
fi

while :; do
   while [ "$(awk_pidof | wc -w)" -le 3 ]; do sleep 5; done;

   /usr/bin/logger "Zombie slayer detects excess cfg_server processes"
   # echo "Zombie slayer detects excess cfg_server processes"

   i=0
   parent_pid=$(pstree -s cfg_server | awk '{if($0 !~ /init/)printf "%s", $2}')

   for pid in $(awk_pidof); do
      if [ "$pid" != "$parent_pid" ] && [ "$i" -gt "2"]; then
         # echo "killing pid $pid"
         /bin/kill -s 9 $pid & slay_pid="${slay_pid} $!"
         # /usr/bin/logger "Zombie slayer killing pid $pid"
         i=$((i + 1))
      fi
   done
   wait $slay_pid
   unset slay_pid
   /usr/bin/logger "Zombie slayer back to sleep after killing $i processes"
   # echo "Zombie slayer back to sleep after killing $i processes"
done

I removed the break so the loop will continue, I added [ "$i" -gt "2"] to make sure we keep at least 3 PID.
 
Last edited:
Are you running stock firmware on the nodes?
both are running 388.1_0-gnuton1. I found that the loop was useless because by killing any one of the 1000-ish of the processes, they were all killed. Leaving the loop just resulted in a lot of kill: can't kill pid 6012: No such process warnings.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top