What's new

Stuck commands

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

My guess is that it's in libnvram.so which is supplied as a prebuilt module for each platform.
Looking at the source code, AFAICT the only routers that have their own prebuilt module are RT-AX88U, XT12, GT-AX11000, RT-AX56U, RT-AX58U, RT-AX68U, RT-AX86U, GT-AX6000, GT-AXE11000 and RT-AC68U_V4. If I'm reading it right all the other routers share a different common module. Given how popular the RT-AC68U is I wonder why we're not getting reports of stuck processes on that model.

So would this have been an issue created from the blobs upstream of @RMerlin?
Yes I believe that's the case.
 
This is only a test router. When I read about issues, I try to replicate.



Correct. I've seen it stuck completely though, with frozen connection rates.
Ah yes, I understand. I suppose some info for the Client List must come directly or indirectly from "nvram" or "wl" calls so it makes sense that it might be affected as well.
 
That was looong way to say "No" but fair enough. :)
Yes, admittedly, it was a long & winded reply; but I wanted to be crystal clear about the reasons & circumstances behind the answer. After all the time & effort that you & @SomeWhereOverTheRainBow had put in, I felt that you deserve to know where I stood. Was it *too* long? Perhaps. Should I have blown you off instead with a simpler and terse: "No, I can't run the tests. Thanks."? I certainly don't think so, but to each his own.
 
Last edited:
@dave14305 If you have the time would you mind running the following script on your RT-AC86U. It should print out a list of active netlink socket numbers that don't have a matching pid. My router is very minimal and doesn't run things like AiProtection so I'm curious to see if you have a lot more mismatched netlink sockets than I do (6 x 2 = 12).

Code:
#!/bin/sh

cat /proc/net/netlink | sort -nk3 | \
awk '
BEGIN {
   print "\nPrint netlink sockets for which there is no process with the same number\n"
   getline pid_max < "/proc/sys/kernel/pid_max"
}
{
   if ( $2 == "31" ) {
      if ( $3 < pid_max )
         system("kill -0 " $3 " 2>/dev/null || echo \"Process " $3 " not found\"")
      else {
         orig_pid = $3 - pid_max - 2
         system("kill -0 " orig_pid " 2>/dev/null || echo \"Process associated with " $3 " not found (" orig_pid ")\"")
      }
   }
}
END { print "\nA number in brackets is a *guess* at an associated process\n" }
'
EDIT: Removed some unnecessary experimental code from script (just in case it confuses people).
No comments on the quality of my coding please. :)

P.S. Not that this achieves anything other that satisfying my curiosity. :D
 
Last edited:
@dave14305 If you have the time would you mind running the following script on your RT-AC86U. It should print out a list of active netlink socket numbers that don't have a matching pid. My router is very minimal and doesn't run things like AiProtection so I'm curious to see if you have a lot more mismatched netlink sockets than I do (6 x 2 = 12).

Code:
#!/bin/sh

cat /proc/net/netlink | sort -nk3 | \
awk '
BEGIN {
   print "\nPrint netlink sockets for which there is no process with the same number\n"
   getline pid_max < "/proc/sys/kernel/pid_max"
}
{
   if ( $2 == "31" ) {
      if ( $3 < 32768 )
         system("kill -0 " $3 " 2>/dev/null || echo \"Process " $3 " not found\"")
      else {
         orig_pid = $3 - 32770
         system("kill -0 " orig_pid " 2>/dev/null || echo \"Process " $3 " not found (" orig_pid ")\"")
         if ( pid_max != 32768 && $3 >= pid_max ) {
            orig_pid = $3 - pid_max - 2
            system("kill -0 " orig_pid " 2>/dev/null || echo \"Process " $3 " not found (" orig_pid ")\"")
         }
      }
   }
}
END { print "\nA number in brackets is a *guess* at an associated process\n" }
'
No comments on the quality of my coding please. :)
That looks legit, great work!
 
If you have the time would you mind running the following script on your RT-AC86U
Here’s the output (No AiProtection, but A.QoS is active).
Code:
Print netlink sockets for which there is no process with the same number

Process 2325 not found
Process 23851 not found
Process 23858 not found
Process 23863 not found
Process 23865 not found
Process 23902 not found
Process 35095 not found (2325)
Process 56621 not found (23851)
Process 56628 not found (23858)
Process 56633 not found (23863)
Process 56635 not found (23865)
Process 56672 not found (23902)

A number in brackets is a *guess* at an associated process
 
Interesting thread! Would killing "conn_diag" bypass this problem? It's not clear to me which functionality we would be losing if we just kill the parent process...
 
Here’s the output (No AiProtection, but A.QoS is active).
Code:
Print netlink sockets for which there is no process with the same number

Process 2325 not found
Process 23851 not found
Process 23858 not found
Process 23863 not found
Process 23865 not found
Process 23902 not found
Process 35095 not found (2325)
Process 56621 not found (23851)
Process 56628 not found (23858)
Process 56633 not found (23863)
Process 56635 not found (23865)
Process 56672 not found (23902)

A number in brackets is a *guess* at an associated process
Thanks Dave. So it doesn't look that much different than my router. Looks like you have just one more process than the 6 I mentioned earlier.

Interesting thread! Would killing "conn_diag" bypass this problem?
It wouldn't solve the issue with the other processes.
 
@dave14305 If you have the time would you mind running the following script on your RT-AC86U. It should print out a list of active netlink socket numbers that don't have a matching pid. My router is very minimal and doesn't run things like AiProtection so I'm curious to see if you have a lot more mismatched netlink sockets than I do (6 x 2 = 12).

Code:
#!/bin/sh

cat /proc/net/netlink | sort -nk3 | \
awk '
BEGIN {
   print "\nPrint netlink sockets for which there is no process with the same number\n"
   getline pid_max < "/proc/sys/kernel/pid_max"
}
{
   if ( $2 == "31" ) {
      if ( $3 < pid_max )
         system("kill -0 " $3 " 2>/dev/null || echo \"Process " $3 " not found\"")
      else {
         orig_pid = $3 - pid_max - 2
         system("kill -0 " orig_pid " 2>/dev/null || echo \"Process associated with " $3 " not found (" orig_pid ")\"")
      }
   }
}
END { print "\nA number in brackets is a *guess* at an associated process\n" }
'
EDIT: Removed some unnecessary experimental code from script (just in case it confuses people).
No comments on the quality of my coding please. :)

P.S. Not that this achieves anything other that satisfying my curiosity. :D
My only comment on your coding - it’s very well done and very readable.
Close to “self documenting” ;-)

BTW, I have never seen/used the awk “system” command. Thanks.
 
My only comment on your coding - it’s very well done and very readable.
Close to “self documenting” ;-)

BTW, I have never seen/used the awk “system” command. Thanks.
You can do amazing things with awk...lol
 
IMO this has been one of the MOST interesting reads within these forums... Is it safe to assume this information has already been passed on to Asus? And if so... It would be nice to know if there actually was an (affirmation/verification) of the problem or better yet an assigned ticket number or commitment to resolve.
 
IMO this has been one of the MOST interesting reads within these forums...

And this thread too:


Asus guys - watch and learn. :)
 
Time to tackle “dcd Tainted”.
  1. Enable AiProtection.
  2. ifconfig br0:0 192.168.50.2 up
  3. killall dcd
  4. dcd -i 3600 -p 43200 -b -l 8 -d /tmp/bwdpi/
  5. Watch syslog in the next 30 mins.
strace doesn’t tell much.
Code:
9408       0.001529 pipe2([6<pipe:[4007056]>, 7<pipe:[4007056]>], O_CLOEXEC) = 0
9408       0.000133 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf7452068) = 10087
9408       0.000460 close(7<pipe:[4007056]>) = 0
9408       0.000051 fcntl64(6<pipe:[4007056]>, F_SETFD, 0) = 0
9408       0.000070 fstat64(6<pipe:[4007056]>, 0xff91c948) = 0
9408       0.000048 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7455000
9408       0.000047 read(6<pipe:[4007056]>, "bcmsw     Link encap:Ethernet  H"..., 4096) = 4096
9408       0.008351 read(6<pipe:[4007056]>, " ALLMULTI MULTICAST  MTU:1500  M"..., 4096) = 1685
9408       0.000875 read(6<pipe:[4007056]>, "", 4096) = 0
9408       0.000362 close(6<pipe:[4007056]>) = 0
9408       0.000051 wait4(10087, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 10087
9408       0.000075 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10087, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
9408       0.000033 munmap(0xf7455000, 4096) = 0
9408       0.000060 gettimeofday({tv_sec=283521037094607, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000070 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(135"..., iov_len=5828}], 1) = 5828
9408       0.000729 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 5848, MSG_NOSIGNAL) = 5848
9408       0.000209 gettimeofday({tv_sec=287858954063567, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000054 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(139"..., iov_len=62}], 1) = 62
9408       0.000097 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 82, MSG_NOSIGNAL) = 82
9408       0.000322 gettimeofday({tv_sec=289899063529167, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000051 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(141"..., iov_len=114}], 1) = 114
9408       0.000068 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 134, MSG_NOSIGNAL) = 134
9408       0.000152 gettimeofday({tv_sec=291075884568271, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000051 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(149"..., iov_len=67}], 1) = 67
9408       0.000121 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 87, MSG_NOSIGNAL) = 87
9408       0.000091 gettimeofday({tv_sec=292196871032527, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000046 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(139"..., iov_len=60}], 1) = 60
9408       0.000117 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 80, MSG_NOSIGNAL) = 80
9408       0.000091 gettimeofday({tv_sec=293292087693007, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000047 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(141"..., iov_len=114}], 1) = 114
9408       0.000121 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 134, MSG_NOSIGNAL) = 134
9408       0.000088 gettimeofday({tv_sec=294395894288079, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000045 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(149"..., iov_len=51}], 1) = 51
9408       0.000117 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 71, MSG_NOSIGNAL) = 71
9408       0.000080 gettimeofday({tv_sec=295443866308303, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000045 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(149"..., iov_len=67}], 1) = 67
9408       0.000117 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 87, MSG_NOSIGNAL) = 87
9408       0.000096 gettimeofday({tv_sec=296560557805263, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000047 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(139"..., iov_len=73}], 1) = 73
9408       0.000117 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 93, MSG_NOSIGNAL) = 93
9408       0.000092 gettimeofday({tv_sec=297668659367631, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000047 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(141"..., iov_len=114}], 1) = 114
9408       0.000126 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 134, MSG_NOSIGNAL) = 134
9408       0.000081 gettimeofday({tv_sec=298768170995407, tv_usec=17805393838808118432}, NULL) = 0
9408       0.000046 writev(2</dev/pts/1>, [{iov_base="dcd[9408]: [fill_router_nifs(149"..., iov_len=51}], 1) = 51
9408       0.000116 send(3<socket:[3998334]>, "<30>Jun 22 22:24:15 dcd[9408]: ["..., 71, MSG_NOSIGNAL) = 71
9408       0.000086 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
9417       0.053733 <... rt_sigtimedwait resumed> <unfinished ...>) = ?
9410       0.000026 <... _newselect resumed> <unfinished ...>) = ?
9409       0.000015 <... rt_sigtimedwait resumed> <unfinished ...>) = ?
9412       0.000043 <... nanosleep resumed> <unfinished ...>) = ?
9411       0.000015 <... restart_syscall resumed>) = ?
9411       0.000070 +++ killed by SIGSEGV +++
9412       0.000063 +++ killed by SIGSEGV +++
9409       0.000052 +++ killed by SIGSEGV +++
9410       0.000054 +++ killed by SIGSEGV +++
9417       0.000039 +++ killed by SIGSEGV +++
9408       0.000779 +++ killed by SIGSEGV +++
 
Time to tackle “dcd Tainted”.
Create a virtual interface (i.e. eth0:0). Here dcd triggered it on its very first refresh (I think it was within 15 mins, I forgot the timer).
 
Create a virtual interface (i.e. eth0:0). Here dcd triggered it on its very first refresh (I think it was within 15 mins, I forgot the timer).
I’ve always wondered why lo:0 didn’t cause problems, but now I believe it’s because there is no HWaddr address associated with lo.
 
I’ve always wondered why lo:0 didn’t cause problems, but now I believe it’s because there is no HWaddr address associated with lo.
correct. Sounds like trendmicro is only concerned with traffic leaving or entering points where there is a HWaddr associated with it. internal addresses such as 127.0.1.1 or 127.0.0.1 would probably be completely ignored as well....
 
Here's the output from my AC86U.
It's on Merlin 386.7, with Diversion 4.3.0, scMerlin and dn-vnStat installed. Relatively freshly rebooted (few hours ago). Executed this after the reboot:
echo 4194304 > /proc/sys/kernel/pid_max

From Colin's diagnostic script:
Print netlink sockets for which there is no process with the same number

Process 1091 not found
Process 1232 not found
Process 1243 not found
Process 1247 not found
Process 2087 not found
Process 2225 not found
Process 2409 not found
Process 2752 not found
Process 32771 not found
Process 33697 not found
Process 33781 not found
Process 33845 not found
Process 33852 not found
Process 33859 not found
Process 33860 not found
Process 33861 not found
Process 34002 not found
Process 34010 not found
Process 34012 not found
Process 34013 not found
Process 34017 not found
Process 34739 not found
Process 34743 not found
Process 34744 not found
Process 34746 not found
Process 34747 not found
Process 34748 not found
Process 34751 not found
Process 34753 not found
Process 34807 not found
Process 34810 not found
Process 34811 not found
Process 34813 not found
Process 34856 not found
Process 34857 not found
Process 34858 not found
Process 34929 not found
Process 34995 not found
Process 35113 not found
Process 35179 not found
Process 35270 not found
Process 35522 not found
Process 50758 not found

A number in brackets is a *guess* at an associated process
And from ps w:
PID USER VSZ STAT COMMAND
1 admin 13440 S /sbin/init
2 admin 0 SW [kthreadd]
3 admin 0 SW [ksoftirqd/0]
4 admin 0 SW [kworker/0:0]
5 admin 0 SW< [kworker/0:0H]
6 admin 0 SW [kworker/u4:0]
7 admin 0 SW [rcu_preempt]
8 admin 0 SW [rcu_sched]
9 admin 0 SW [rcu_bh]
10 admin 0 SW [migration/0]
11 admin 0 SW [watchdog/0]
12 admin 0 SW [watchdog/1]
13 admin 0 SW [migration/1]
14 admin 0 SW [ksoftirqd/1]
16 admin 0 SW< [kworker/1:0H]
17 admin 0 SW< [khelper]
18 admin 0 SW [kdevtmpfs]
19 admin 0 SW< [writeback]
21 admin 0 SWN [ksmd]
22 admin 0 SW< [crypto]
23 admin 0 SW< [bioset]
24 admin 0 SW< [kblockd]
25 admin 0 DW [skbFreeTask]
26 admin 0 SW [bcmFapDrv]
27 admin 0 SWN [kswapd0]
28 admin 0 SW [fsnotify_mark]
53 admin 0 SW [btnhandler0]
54 admin 0 SW [btnhandler1]
55 admin 0 SW [btnhandler2]
56 admin 0 SW [kworker/1:1]
57 admin 0 SW< [linkwatch]
58 admin 0 SW< [ipv6_addrconf]
59 admin 0 SW< [deferwq]
60 admin 0 SW [ubi_bgt0d]
170 admin 0 SWN [jffs2_gcd_mtd2]
204 admin 0 SW [bcmFlwStatsTask]
208 admin 0 SW [bcmsw_rx]
209 admin 0 SW [bcmsw]
217 admin 0 SW [pdc_rx]
285 admin 18508 S /bin/swmdk
292 admin 1568 S {wdtctl} wdtd
296 admin 1712 S hotplug2 --persistent --no-coldplug
299 admin 1752 S /usr/sbin/envrams
362 admin 0 SWN [jffs2_gcd_mtd9]
540 admin 0 SW [dhd_watchdog_th]
541 admin 0 SW [wfd0-thrd]
546 admin 0 SW [dhd_watchdog_th]
547 admin 0 SW [wfd1-thrd]
927 admin 11756 S console
981 admin 0 SW [kworker/1:2]
1011 admin 11752 S /sbin/PS_pod
1015 admin 3324 S /sbin/syslogd -m 0 -S -O /tmp/syslog.log -s 256 -l 7
1017 admin 3324 S /sbin/klogd -c 5
1068 admin 0 SW [kworker/0:2]
1075 admin 11752 S /sbin/wanduck
1082 admin 8312 S asd
1087 admin 2376 S /usr/sbin/haveged -r 0 -w 1024 -d 32 -i 32
1088 admin 13540 S nt_monitor
1089 admin 6824 S protect_srv
1090 admin 13800 S /sbin/netool
1100 admin 11460 S nt_center
1230 admin 2924 S /bin/eapd
1240 admin 11752 S wpsaide
1242 admin 4648 S /usr/sbin/wlc_nt
1246 admin 3744 S nas
1278 admin 5032 S /usr/sbin/wlceventd
1741 admin 5012 S nt_actMail
1952 admin 3120 S /usr/sbin/acsd
1954 admin 2856 S /usr/sbin/dhd_monitor
1966 admin 3324 S crond -l 9
1969 admin 5412 S vis-dcon
1973 admin 4804 S vis-datacollector
1974 admin 6956 S /usr/sbin/infosvr br0
1976 admin 3384 S sysstate
1977 admin 11752 S watchdog
1978 admin 11752 S check_watchdog
1981 admin 4272 S rstats
1983 admin 11752 S amas_lanctrl
2026 admin 3208 S lld2d br0
2033 admin 10532 S vis-dcon
2037 admin 12484 S networkmap --bootwait
2040 admin 10680 S mastiff
2041 admin 11752 S bwdpi_check
2043 admin 11756 S pctime
2086 admin 19964 S roamast
2088 admin 13808 S conn_diag
2091 admin 1704 S /usr/sbin/sd-idle-2.6 -i 1800
2101 admin 3316 S lldpd -L /usr/sbin/lldpcli -I eth1,eth2,eth3,eth4,eth5,eth6,wds0.*.*,wds1.*.* -s RT-AC86U
2105 nobody 3316 S lldpd -L /usr/sbin/lldpcli -I eth1,eth2,eth3,eth4,eth5,eth6,wds0.*.*,wds1.*.* -s RT-AC86U
2107 admin 12760 S cfg_server
2159 admin 14920 S amas_lib
2167 admin 2356 S dropbear -p 192.168.20.1:22 -j -k
2217 admin 0 SW [kworker/u4:2]
2324 admin 0 SW [scsi_eh_0]
2325 admin 0 SW< [scsi_tmf_0]
2326 admin 0 SW [usb-storage]
2343 admin 11752 S usbled
2379 admin 3680 S /usr/sbin/ntp -t -S /sbin/ntpd_synced -p pool.ntp.org
2410 admin 0 SW< [kworker/0:1H]
2417 admin 1916 S /bin/mcpd
2500 admin 11752 S disk_monitor
2548 admin 6448 S /etc/openvpn/vpnserver1 --cd /etc/openvpn/server1 --config config.ovpn
2550 admin 6448 S /etc/openvpn/vpnserver1 --cd /etc/openvpn/server1 --config config.ovpn
2696 admin 3324 S /sbin/udhcpc -i eth0 -p /var/run/udhcpc0.pid -s /tmp/udhcpc_wan -t2 -T5 -A160 -H ****-AP -O33 -O249
2766 admin 0 SW [jbd2/sda1-8]
2767 admin 0 SW< [ext4-rsv-conver]
2829 admin 0 SW< [kworker/1:1H]
2844 admin 5200 S vnstatd -d --noadd --config /opt/share/dn-vnstat.d/vnstat.conf
2872 nobody 28624 S pixelserv-tls 192.168.20.2
3608 nobody 80904 S dnsmasq --log-async
3609 admin 2596 S dnsmasq --log-async
3832 nobody 3096 S avahi-daemon: running [RT-AC86U.local]
3861 admin 11392 S /usr/sbin/nmbd -D -s /etc/smb.conf
3864 admin 1856 S /usr/sbin/wsdd2 -d -w -i br0 -b sku:RT-AC86U,serial:************
3866 admin 11560 S /usr/sbin/smbd -D -s /etc/smb.conf
3867 admin 6940 S vsftpd /etc/vsftpd.conf
3870 admin 2408 S miniupnpd -f /etc/upnp/config
3874 admin 12880 S minidlna -f /etc/minidlna.conf -R -W
3913 admin 3324 S {tailtopd} /bin/sh /jffs/addons/scmerlin.d/tailtopd
3915 admin 3324 S N {tailtop} /bin/sh /jffs/addons/scmerlin.d/tailtop
4284 admin 2484 S dropbear -p 192.168.20.1:22 -j -k
4309 admin 3328 S -sh
6354 admin 12132 S /usr/sbin/smbd -D -s /etc/smb.conf
17988 admin 11108 S httpd -i br0
67358 admin 3192 S sleep 5
67372 admin 3192 S N sleep 4
67376 admin 3328 R ps w
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top