My experience with the RT-AC86U

SomeWhereOverTheRainBow · Jun 18, 2022

Oracle said:
Here's some log output from running the test loop on tighter timings.
It reported 5 retries, while the average is 2-3.

Running time appears unaffected: 5:07 min per 3000 loop iterations.

I switched to these versions because the other one using timeout falls apart if I unplug the USB drive.
Experienced it during the shutdown process. Unmounted the USB device and before executing sync;halt a few seconds later I was flooded with errors.

If we could somehow compile a static binary for timeout, then maybe it can be stored in JFFS. It is alot of work for doing such though..

ColinTaylor · Jun 18, 2022

If you guys want to try an alternative approach that doesn't involve scripts you could try the following tweak. This won't fix the issue, but in theory (based on the balance of probability) it ought to reduce the frequency of the problem occurring by a factor of ~~128~~ ~64.

Code:

echo 4194304 > /proc/sys/kernel/pid_max

SomeWhereOverTheRainBow · Jun 18, 2022

ColinTaylor said:
If you guys want to try an alternative approach that doesn't involve scripts you could try the following tweak. This won't fix the issue, but in theory (based on the balance of probability) it ought to reduce the frequency of the problem occurring by a factor of 128.

Code:

echo 4194304 > /proc/sys/kernel/pid_max

So let me see if I can some what understand you on your level- by increasing the pool of PID you hope that it will somehow add more randomness to this occurrence? And more randomness means less likely to occur?

For anyone not keeping the score properly like me, here is a link that will provide you with all the information on @ColinTaylor recommendation.

https://www.cyberciti.biz/tips/howto-linux-increase-pid-limits.html

The theory being that increasing this size will mean we are less likely to see these occurrences.

However one caveat that must be considered is

Please note that this hack is only useful for a large and busy server; don’t try this on an old kernel or on desktop systems.

Which may actually be perfect for our case!

Here is another interesting read.....

Understanding the differences between pid_max, ulimit -u and thread_max

I am trying to understand the Linux processes. I'm confused on the respective terms pid_max, ulimit -u and thread_max. What exactly is the difference between these terms? Can someone clarify the

unix.stackexchange.com

ColinTaylor · Jun 19, 2022

SomeWhereOverTheRainBow said:
So let me see if I can some what understand you on your level- by increasing the pool of PID you hope that it will somehow add more randomness to this occurrence? And more randomness means less likely to occur?

Correct.

In my case (if I had an RT-AC86U) there are 12 "problem pids". 6 of these are in the 1000-3000 range and the other 6 are in the 33000-35000 range (which is beyond pid_max). When the systems' current pid number is past the first 6 problem pids everything should be fine until it hits pid_max and loops back to the beginning. This time round the loop, when the current process' pid matches one of the 6 problem pids there's a chance that that process is making an nvram call. This will cause it to hang. If it doesn't make an nvram call it's not a problem.

So by increasing pid_max from 32768 to 4194304 it takes 128 times longer to restart the loop. However, while the loop is much bigger it now also encompasses pids 33000-35000 which it didn't before. As I don't have any intensive add-on scripts that would churn through pids I estimate it would take my router about 24 days to restart each loop.

dave14305 · Jun 19, 2022

ColinTaylor said:
Correct.

In my case (if I had an RT-AC86U) there are 12 "problem pids". 6 of these are in the 1000-3000 range and the other 6 are in the 33000-35000 range (which is beyond pid_max). When the systems' current pid number is past the first 6 problem pids everything should be fine until it hits pid_max and loops back to the beginning. This time round the loop, when the current process' pid matches one of the 6 problem pids there's a chance that that process is making an nvram call. This will cause it to hang. If it doesn't make an nvram call it's not a problem.

So by increasing pid_max from 32768 to 4194304 it takes 128 times longer to restart the loop. However, while the loop is much bigger it now also encompasses pids 33000-35000 which it didn't before. As I don't have any intensive add-on scripts that would churn through pids I estimate it would take my router about 24 days to restart each loop.

My observation with conn_diag though, is that the spawned wl command is very close to the pid of conn_diag, and not necessarily the next one past the high-water mark. Even restarting conn_diag generates a new process in the “middle” of existing pids for me. YMMV.

ColinTaylor · Jun 19, 2022

dave14305 said:
My observation with conn_diag though, is that the spawned wl command is very close to the pid of conn_diag, and not necessarily the next one past the high-water mark. Even restarting conn_diag generates a new process in the “middle” of existing pids for me. YMMV.

Yes, this is expected. It won't be past the high-water mark. It doesn't really matter where in the pid pool it is, just that the size of the pool is much bigger.

JohnD5000 · Jun 19, 2022

Ran the script from the first post on a GT-AC2900 and the script stops before 10,000. Ran multiple times. Anywhere between 500 and 5500

SomeWhereOverTheRainBow · Jun 19, 2022

JohnD5000 said:
Ran the script from the first post on a GT-AC2900 and the script stops before 10,000. Ran multiple times. Anywhere between 500 and 5500

Try using the one from this post first

Post in thread 'My experience with the RT-AC86U' https://www.snbforums.com/threads/my-experience-with-the-rt-ac86u.79290/post-769487

Then use the one in the first post.

SomeWhereOverTheRainBow · Jun 19, 2022

JohnD5000 said:
Ran the script from the first post on a GT-AC2900 and the script stops before 10,000. Ran multiple times. Anywhere between 500 and 5500

I think one thing people fail to realize with loops is that they could still be running before the other iteration has fully completed. So real question begs to ask is nvram simply locking up because it is over welhmed and not finished with the previous iteration?

JohnD5000 · Jun 19, 2022

SomeWhereOverTheRainBow said:
Try using the one from this post first

Post in thread 'My experience with the RT-AC86U' https://www.snbforums.com/threads/my-experience-with-the-rt-ac86u.79290/post-769487

Then use the one in the first post.

Doing this, it gets all the way to 10001. Don't really understand what all this means, though, but I guess the GT-AC2900 doesn't have an issue?

Oracle · Jun 19, 2022

I'm still catching up with latest comments, but meantime:
@SomeWhereOverTheRainBow, could you explain what the time intervals from your version are actually doing? I.e., where is this time interval used and what happens when it expires?
Looks like there's hardly any difference if it's 10 or 50 - at least I can't see it.

Oracle · Jun 19, 2022

JohnD5000 said:
Doing this, it gets all the way to 10001. Don't really understand what all this means, though, but I guess the GT-AC2900 doesn't have an issue?

This could very well mean that:
1) the GT-AC2900 does exhibit the same defect;
2) the wrapper script that solves the problem for the AC86U also solves it for the GT-AC2900.

Since you are the first and only person so far to report the nvram bug for this model, I'd say more testing and cases are needed.

I should probably reorganize my first page posts, with less words and clearer info. Then put the details in separate posts, for whoever wants to red the details. Or something of this nature.

SomeWhereOverTheRainBow · Jun 19, 2022

Oracle said:
I'm still catching up with latest comments, but meantime:
@SomeWhereOverTheRainBow, could you explain what the time intervals from your version are actually doing? I.e., where is this time interval used and what happens when it expires?
Looks like there's hardly any difference if it's 10 or 50 - at least I can't see it.

To be honest I wrote the script with the intention of the internal to serve as a fail safe for a fail safe. You would have to have multiple recurrence of lockups all at once for it to ever trigger the higher intervals, this was done because I have no way to optimize the interval and it serves as a best case and worse case optimization. Feel free to use the script as is or how ever you would like to modify it. I made it with the intention of that, and also to help users suffering from the deadlock of this condition.

Oracle · Jun 19, 2022

That's fine, I just don't know this syntax. I.e., what is this interval, in what unit of measure, when does the count start and what if it runs out? Is that when the process is killed?
I could try to adjust it but I don't understand what to look for.

SomeWhereOverTheRainBow · Jun 19, 2022

Oracle said:
That's fine, I just don't know this syntax. I.e., what is this interval, in what unit of measure, when does the count start and what if it runs out? Is that when the process is killed?
I could try to adjust it but I don't understand what to look for.

If you feel it will serve fine to remove all the other intervals then do that. It is more for your adjustability than mine.

SomeWhereOverTheRainBow · Jun 19, 2022

Oracle said:
That's fine, I just don't know this syntax. I.e., what is this interval, in what unit of measure, when does the count start and what if it runs out? Is that when the process is killed?
I could try to adjust it but I don't understand what to look for.

here it is at just 10

Bash:

#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode
# required for serialization when reentry is possible
LOCK="/tmp/$(basename "$0").lock"
acquire_lock() { until mkdir "$LOCK" &>/dev/null; do touch /tmp/nvram; done; }
release_lock() { rmdir "$LOCK" &>/dev/null; }

# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$((usecount + 1 ))
echo $usecount > /tmp/nvramuse

INTERVAL="10"
MAXCOUNT="3"
run_cmd () {
    local to
    local start
    local child
    # here as the interval number increases, the longer we wait.
    to="$1"
    to="$((to*INTERVAL))"; shift
    $@ & local child="$!" start=0
    touch /tmp/nvram
    while { [ "$(kill -0 $child >/dev/null 2>&1; printf "%s" "$?")" = "0" ] && [ "$start" -le "$to" ]; }; do
        # to account for killing too soon, as the number of tries required increases our count requirement increases before we attempt to kill the process.
        touch /tmp/nvram
        start="$((start+1))"
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            wait $child
            return 1
        fi
    done
    return 0
}

# make the new function accessible, on the first run we want to exit right away if successful.
i="1"
if { run_cmd "$i" /tmp/_nvram "$@"; }; then rc="0"; else rc="1";fi

logger -t "nvram-override" "Executed nvram $@, use count: $usecount, exit status: $rc"

# here we add an interval check and allow up to 3 retries.
while [ "$i" -le "$MAXCOUNT" ] && [ "$rc" != "0" ]; do
  touch /tmp/nvram
  if { run_cmd "$i" /tmp/_nvram "$@"; }; then
    rc="0";
  else
    rc="1";
    errcount="$rc";
    if [ ! -f "/tmp/nvramerr" ]; then echo 0 > /tmp/nvramerr; else errcount=$(cat /tmp/nvramerr); fi
    errcount=$((errcount + 1 ));
    echo $errcount > /tmp/nvramerr;
    logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount";
    logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)";
  fi
  logger -t "nvram-override" "Retried executing nvram $@, attempt ${i}/${MAXCOUNT}, exit status: $rc";
  i="$((i+1))";
done
[ "$rc" -eq "1" ] && logger -t "nvram-override" "NVRAM remained locked too long; continuing anyway."
# any concurrent instance(s) may now run
release_lock
exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

@Oracle while I respect your attempts at trying to capture the logs and error statistics, is there a way it can be done where it does not slow down the actual processing of the script? For example, how fast does the script go without all the logs and requirements to error track statistics?

Bash:

#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode
# required for serialization when reentry is possible
LOCK="/tmp/$(basename "$0").lock"
acquire_lock() { until mkdir "$LOCK" &>/dev/null; do touch /tmp/nvram; done; }
release_lock() { rmdir "$LOCK" &>/dev/null; }

# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
#if [ ! -f "/tmp/nvramuse" ]; then
#  echo 0 > /tmp/nvramuse
#fi
#usecount=$(cat /tmp/nvramuse)
#usecount=$((usecount + 1 ))
#echo $usecount > /tmp/nvramuse

INTERVAL="10"
MAXCOUNT="3"
run_cmd () {
    local to
    local start
    local child
    # here as the interval number increases, the longer we wait.
    to="$1"
    to="$((to*INTERVAL))"; shift
    $@ & local child="$!" start=0
    touch /tmp/nvram
    while { [ "$(kill -0 $child >/dev/null 2>&1; printf "%s" "$?")" = "0" ] && [ "$start" -le "$to" ]; }; do
        # to account for killing too soon, as the number of tries required increases our count requirement increases before we attempt to kill the process.
        touch /tmp/nvram
        start="$((start+1))"
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            wait $child
            return 1
        fi
    done
    return 0
}

# make the new function accessible, on the first run we want to exit right away if successful.
i="1"
if { run_cmd "$i" /tmp/_nvram "$@"; }; then rc="0"; else rc="1";fi

#logger -t "nvram-override" "Executed nvram $@, use count: $usecount, exit status: $rc"

# here we add an interval check and allow up to 3 retries.
while [ "$i" -le "$MAXCOUNT" ] && [ "$rc" != "0" ]; do
  touch /tmp/nvram
  if { run_cmd "$i" /tmp/_nvram "$@"; }; then
    rc="0";
  else
    rc="1";
    #errcount="$rc";
    #if [ ! -f "/tmp/nvramerr" ]; then echo 0 > /tmp/nvramerr; else errcount=$(cat /tmp/nvramerr); fi
    #errcount=$((errcount + 1 ));
    #echo $errcount > /tmp/nvramerr;
    #logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount";
    #logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)";
  fi
  #logger -t "nvram-override" "Retried executing nvram $@, attempt ${i}/${MAXCOUNT}, exit status: $rc";
  i="$((i+1))";
done
#[ "$rc" -eq "1" ] && logger -t "nvram-override" "NVRAM remained locked too long; continuing anyway."
# any concurrent instance(s) may now run
release_lock
exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

Tech9 · Jun 20, 2022

I have decided to end my experience with RT-AC86U. Goes for recycling.

On my way back home picked a new toy for Asuswrt-Merlin experiments though:

Viktor Jaep · Jun 20, 2022

Tech9 said:
I have decided to end my experience with RT-AC86U. Goes for recycling.

On my way back home picked a new toy for Asuswrt-Merlin experiments though:

Probably the whole reason they went with the AX line... too many issues they weren't able to fix with software. I'm not far behind you...

Tech9 · Jun 20, 2022

RT-AX86U is up and running, but it has weaker signal to my test AC client behind 2 walls.

RT-AC86U - 585/585
RT-AX86U - 390/390

I got this one just to play with it. It has scheduled surgery procedure in coming weeks.

Viktor Jaep · Jun 20, 2022

Tech9 said:
RT-AX86U is up and running, but it has weaker signal to my test AC client behind 2 walls.

RT-AC86U - 585/585
RT-AX86U - 390/390

I got this one just to play with it. It has scheduled surgery procedure in coming weeks.

That's alright... I just need a hardline into it for our general purposes. It's wifi signal is just for me to play with...

Thread starter	Title	Forum	Replies	Date
J	Experience using PIA Wireguard on your router (ASUSWRT-MERLIN)	Asuswrt-Merlin	4	Mar 21, 2025
	RT-AC86U and IPv6	Asuswrt-Merlin	7	Jul 18, 2025
C	Migrating from RT-AC86U to RT-BE86U need advice	Asuswrt-Merlin	5	Jul 7, 2025
U	Need Help Setting Up 3 VLANs (Home, Guest, IoT) on ASUSWRT-Merlin (RT-AC86U)	Asuswrt-Merlin	9	Jun 22, 2025
P	WiFi drops Internet rt-ac86u on 386.14_2	Asuswrt-Merlin	8	Jun 14, 2025
L	My ASUS router RT-AC86U is a very old device， so, I need your guys help!	Asuswrt-Merlin	17	Jun 10, 2025
O	RT-AC86U reboots everyday evening for no reason	Asuswrt-Merlin	15	May 15, 2025
C	386.14_2 - RT-AC86U - Web UI Inaccessible After Upgrading	Asuswrt-Merlin	5	Mar 17, 2025
	[Asuswrt-Merlin 386.14_2] RT-AC86U	Asuswrt-Merlin	3	Feb 21, 2025
	ASUS Merlin on RT-AC86U OpenVPN Server not blocking IP when a client connects	Asuswrt-Merlin	2	Jan 8, 2025

My experience with the RT-AC86U

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Very Senior Member

Part of the Furniture

Part of the Furniture

Very Senior Member

Regular Contributor

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest