My experience with the RT-AC86U

SomeWhereOverTheRainBow · Jun 14, 2022

Oracle said:
I don't expect to be able to get a replacement.

In any case, I think it's reasonable for Asus to investigate this. It's in their own interest.
Maybe it's a batch of faulty nvram chips. They could turn to their supplier for this and take measures not to allow it again.
Or it could be bad CPU design. What if there's a flaw in the nvram read instructions?
Or could be a flaw in their implementation - power supply, EM interference, etc.
Given what they've done with the thermal design* of this router, this may seem minor. Yet it's not so negligible, because if the root cause is not understood, problem may be replicated again.

*This is a big red flag. A company that releases such products worries me (unless it's intentional to shorten the products' life and make the users buy more frequently, in which case it's not incompetence but policy).

@SomeWhereOverTheRainBow: no, I did not do a factory reset.

That would be one of the troubleshooting steps asus might require you to perform.

Oracle · Jun 14, 2022

I got the hint.

Factory reset, reconfigured everything from scratch. Problem with Samba gone.

For the record, if anyone gets in the same situation - these zombie processes hogging the CPU are not spawned by the router. The Windows host somehow initiates this mess. Problem is, even after the Windows host is shut down, the CPU consuming processes on the router stay (but no new ones get created). Why and how does it happen - no idea. I might eventually drop Samba altogether - it's just very handy on a network with different hosts (Windows, Linux, Android).

The nvram get issue is still here to haunt me.

SomeWhereOverTheRainBow · Jun 14, 2022

Oracle said:
I got the hint.
Factory reset, reconfigured everything from scratch. Problem with Samba gone.

For the record, if anyone gets in the same situation - these zombie processes hogging the CPU are not spawned by the router. The Windows host somehow initiates this mess. Problem is, even after the Windows host is shut down, the CPU consuming processes on the router stay (but no new ones get created). Why and how does it happen - no idea. I might eventually drop Samba altogether - it's just very handy on a network with different hosts (Windows, Linux, Android).

The nvram get issue is still here to haunt me.

Yea it is pretty bad when you try to RMA a bricked router that cannot access recovery mode and the technician insist on reading the same script telling you to access the factory reset page.

Viktor Jaep · Jun 14, 2022

SomeWhereOverTheRainBow said:
Okay when you are locked out what happens when you send the kill signal for the process to "resume"?

Here's what happens

Ran the script, it locked up at 645:

According to htop, it choked on "nvram get vpn_client2_state"

I sigkill it, and the script keeps on trucking...

Note how it said "killed" and the vpn state stat is missing

Then it eventually died again at 4147

SomeWhereOverTheRainBow · Jun 14, 2022

Viktor Jaep said:
Here's what happens

Ran the script, it locked up at 645:
View attachment 41866
According to htop, it choked on "nvram get vpn_client2_state"
View attachment 41867
I sigkill it, and the script keeps on trucking...

View attachment 41868
Note how it said "killed" and the vpn state stat is missing

View attachment 41869
Then it eventually died again at 4147

So we need to trap the suspend signal @Viktor Jaep "stp" I believe. It appears the router is suspending the process at a certain point so the loop doesn't run infinitely killing resources. This behavior can be stopped if we trap the suspend signal with the rest of our trap

Viktor Jaep · Jun 14, 2022

SomeWhereOverTheRainBow said:
So we need to trap the suspend signal @Viktor Jaep "stp" I believe. It appears the router is suspending the process at a certain point so the loop doesn't run infinitely killing resources. This behavior can be stopped if we trap the suspend signal with the rest of our trap

Like this?

Code:

#!/bin/sh
trap '' HUP INT QUIT ABRT TERM TSTP
(i="0"
while true; do
  i="$(( i + 1 ))"
  for nv in 1 2 3 4 5; do
  unset "state${nv}";
  eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &


exit 0

SomeWhereOverTheRainBow · Jun 14, 2022

Viktor Jaep said:

Like this?

Code:

#!/bin/sh
trap '' HUP INT QUIT ABRT TERM STP
(i="0"
while true; do
  i="$(( i + 1 ))"
  for nv in 1 2 3 4 5; do
  unset "state${nv}";
  eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &


exit 0

I believe it is "TSTP"

Viktor Jaep · Jun 14, 2022

SomeWhereOverTheRainBow said:
I believe it is "TSTP"

Welp... even with TSTP, I'm seeing similar behavior after sigkilling the stuck nvram get call:

Martinski · Jun 14, 2022

Oracle said:
I don't expect to be able to get a replacement.

In any case, I think it's reasonable for Asus to investigate this. It's in their own interest.
Maybe it's a batch of faulty nvram chips. They could turn to their supplier for this and take measures not to allow it again.
Or it could be bad CPU design. What if there's a flaw in the nvram read instructions?
Or could be a flaw in their implementation - power supply, EM interference, etc.
Given what they've done with the thermal design* of this router, this may seem minor. Yet it's not so negligible, because if the root cause is not understood, problem may be replicated again.

*This is a big red flag. A company that releases such products worries me (unless it's intentional to shorten the products' life and make the users buy more frequently, in which case it's not incompetence but policy).

@SomeWhereOverTheRainBow: no, I did not do a factory reset.

As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.

SomeWhereOverTheRainBow · Jun 14, 2022

Viktor Jaep said:
Welp... even with TSTP, I'm seeing similar behavior after sigkilling the stuck nvram get call:
View attachment 41870
View attachment 41871
View attachment 41872

What about

Code:

#!/bin/sh 
#trap '' HUP INT QUIT ABRT TERM TSTP 
(i="0" 
while true; do 
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do 
    unset "state${nv}"; 
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)" & kill $!;
    wait
  done 
  clear 
  echo "$state1" "$state2" "$state3" "$state4" "$state5" 
  echo "$i" 
done) > /tmp/mynvramerror.log 2>&1 & 
exit 0

Martinski · Jun 14, 2022

Martinski said:
As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.

Well, it looks like the forum didn't like the script file as an attachment, even with a *.TXT file extension. I can put it in Pastebin if you're interested in trying the script.

SomeWhereOverTheRainBow · Jun 14, 2022

Martinski said:
Well, it looks like the forum didn't like the script file as an attachment, even with a *.TXT file extension. I can put it in Pastebin if you're interested in trying the script.

I definitely am interested!

Viktor Jaep · Jun 14, 2022

Martinski said:
Well, it looks like the forum didn't like the script file as an attachment, even with a *.TXT file extension. I can put it in Pastebin if you're interested in trying the script.

Please do! Thanks for your work on this!

chongnt · Jun 14, 2022

Martinski said:
As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.

When the file in /jffs/.sys/diag_db/ is not updating, is it a sign of something stucked? There are two files in this directory and get rotated everyday around 8am. Mine usually stop updating a few days after a reboot.

Viktor Jaep · Jun 14, 2022

SomeWhereOverTheRainBow said:

What about

Code:

#!/bin/sh
#trap '' HUP INT QUIT ABRT TERM TSTP
(i="0"
while true; do
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do
    unset "state${nv}";
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)" & kill $!;
    wait
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &
exit 0

Didn't like this...

nvrampoc.sh: line 12: syntax error: unexpected ")" (expecting "done")

SomeWhereOverTheRainBow · Jun 14, 2022

Viktor Jaep said:
Didn't like this...

nvrampoc.sh: line 12: syntax error: unexpected ")" (expecting "done")

Yea I had to revise the post make sure you have the revised version

Oracle · Jun 14, 2022

@Martinski: This is all good info but I have a question.
If the stuck nvram get commands have nothing to do with the hardware, how come this only happens on the AC86U?

I let the same loop run on an AC68U last night and it got beyond 870,000 iterations before I finally decided to stop it.

Viktor Jaep · Jun 14, 2022

SomeWhereOverTheRainBow said:
Yea I had to revise the post make sure you have the revised version

Wow, whatever you did there caused my /sbin/init to go off the chart and spike utilization to the max. Log was filled with these... something aint right:

FernandoF · Jun 14, 2022

chongnt said:
When the file in /jffs/.sys/diag_db/ is not updating, is it a sign of something stucked? There are two files in this directory and get rotated everyday around 8am. Mine usually stop updating a few days after a reboot.

This is normal. Mine usually stop updating 24-48 hours after a reboot.

SomeWhereOverTheRainBow · Jun 14, 2022

Viktor Jaep said:
Wow, whatever you did there caused my /sbin/init to go off the chart and spike utilization to the max. Log was filled with these... something aint right:

View attachment 41873

Dial it back a minute and try

Code:

#!/bin/sh 
#trap '' HUP INT QUIT ABRT TERM TSTP 
(i="0" 
while true; do 
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do 
    unset "state${nv}"; 
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
    wait
  done 
  clear 
  echo "$state1" "$state2" "$state3" "$state4" "$state5" 
  echo "$i" 
done) > /tmp/mynvramerror.log 2>&1 & 
exit 0

Thread starter	Title	Forum	Replies	Date
J	Experience using PIA Wireguard on your router (ASUSWRT-MERLIN)	Asuswrt-Merlin	4	Mar 21, 2025
U	Need Help Setting Up 3 VLANs (Home, Guest, IoT) on ASUSWRT-Merlin (RT-AC86U)	Asuswrt-Merlin	4	Sunday at 3:55 AM
P	WiFi drops Internet rt-ac86u on 386.14_2	Asuswrt-Merlin	8	Jun 14, 2025
L	My ASUS router RT-AC86U is a very old device， so, I need your guys help!	Asuswrt-Merlin	17	Jun 10, 2025
O	RT-AC86U reboots everyday evening for no reason	Asuswrt-Merlin	15	May 15, 2025
C	386.14_2 - RT-AC86U - Web UI Inaccessible After Upgrading	Asuswrt-Merlin	5	Mar 17, 2025
	[Asuswrt-Merlin 386.14_2] RT-AC86U	Asuswrt-Merlin	3	Feb 21, 2025
	ASUS Merlin on RT-AC86U OpenVPN Server not blocking IP when a client connects	Asuswrt-Merlin	2	Jan 8, 2025
M	AC86U - SSH Disabled few seconds after boot - 386.14.	Asuswrt-Merlin	2	Nov 16, 2024
B	Unable to establish VPN connection to my PiVPN (ovpn) from my Asus RT-AC86U running Asuswrt-Merlin 386.14	Asuswrt-Merlin	1	Oct 27, 2024

My experience with the RT-AC86U

Part of the Furniture

Regular Contributor

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Very Senior Member

Part of the Furniture

Very Senior Member

Part of the Furniture

Part of the Furniture

Very Senior Member

Part of the Furniture

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest