What's new

Asus AC86U - nvram show stops working after few hours causing router not accessible via WebUI

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Luky

Occasional Visitor
Hi,
I came across a strange problem with my AC86U running latest merlin firmware (but it was happening on older as well). Basically router becomes unaccessible via WebUI after few hours after reboot. Just login screen shows up and after putting in login details it doesn't continue any further. Router itself keeps working but administration is possible only using ssh
I tried to figure out what's going on and find out, that this issue is related to deeper problem with nvram command.
After reboot "nvram show" command shows all variables, but after few hours it stops working and deadlocks without showing anything. Digging bit deeper I find out that individual nvram get "variable" commands still work except cases where "variable" is one of the variables stored in "/jffs/nvram/" folder.


Any idea how to fix it?
 
“….running latest merlin firmware (but it was happening on older as well).”

So you’ve had this for a while and it’s not specific to the latest (386.7.2?) firmware. Was there a time when you could access the router without any problems? Can you relate the beginning of the problem to anything you did?
 
The RT-AC86U has a known problem with nvram access. It is described in this thread. The issue is exacerbated by running certain custom add-on scripts that make frequent nvram calls.

The issue can be reduced to some extent by adding the following line to the beginning of the /jffs/scripts/init-start script.
Code:
#!/bin/sh
echo 4194304 > /proc/sys/kernel/pid_max
 
Hmm, interesting.
setting pid_max on already "locked" router doesn't help.
I don't have anything running thousands of nvram commands in mine setup so it seems weird I would run out of pids.
What's more it doesn't explain why "nvram get lan_ipaddr" works, and "nvram get asus_device_list" get stuck - if anything rather number of open sockets rather than pids can be involved

Anyhow run the above commands with strace and here is the diff of important part:
< mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf732f000
< stat64("/jffs", 0xfff64c90) = 0
< stat64("/jffs/nvram_war", 0xfff64c90) = 0
< socket(AF_NETLINK, SOCK_RAW, 0x1f /* NETLINK_??? */) = 3
< bind(3, {sa_family=AF_NETLINK, nl_pid=14525, nl_groups=00000000}, 12) = 0
< brk(NULL) = 0x63000
< brk(0x84000) = 0x84000
< open("/proc/sys/kernel/pid_max", O_RDONLY) = 4
< fstat64(4, 0xfff64b98) = 0
< mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf75ee000
< read(4, "4194304\n", 1024) = 8
< close(4)
---
> mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf74aa000
> stat64("/jffs", 0xffbdc9c0) = 0
> stat64("/jffs/nvram_war", 0xffbdc9c0) = 0
> open("/var/nvram.lock", O_WRONLY|O_CREAT, 0644) = 3
> flock(3, LOCK_EX) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

So it seems that command hangs up on locking /var/nvram.lock.
When I delete /var/nvram.lock than nvram get asus_device_list and nvram show works again
 
For WebUI to start to work I had to remove /var/lock/allwevent.lock and than restart httpd
 
i have the router roughly for a year. I believe it's happening from the very beginning, just the gap between reboot and deadlock perhaps shortened. But i don't have any exact numbers.
I saw other people reporting problems with login here on the forums - i guess it's related https://www.snbforums.com/threads/ac86u-httpd-not-responding.76230/

“I believe it's happening from the very beginning, just the gap between reboot and deadlock perhaps shortened.”

Notwithstanding Colin’s advice and his statement that this is a known problem with your model of router, when you first installed Merlin’s firmware, did you carry out a factory reset, as Merlin insists must be done in his instructions, and are you confident you carried it out correctly?
 
setting pid_max on already "locked" router doesn't help.
No it wouldn't. It would need to be applied as early in the boot process as possible. But it sounds like your problem is something different anyway.

I don't have anything running thousands of nvram commands in mine setup so it seems weird I would run out of pids.
It's not a case of running out of pids. It's the way cloned processes use the same netlink pids that their parent used.
 
Last edited:
hmm, given the orphan "locks" it seems some process died midway without properly cleaning after itself causing further issues down the road. I guess it would be hard to track down. I will see how long it will last after my manual recovery before it lock itself again.

Notwithstanding Colin’s advice and his statement that this is a known problem with your model of router, when you first installed Merlin’s firmware, did you carry out a factory reset, as Merlin insists must be done in his instructions, and are you confident you carried it out correctly?
Yes I did
 
I have the same issue as well with my GT-AC2900 except for me the login page doesn’t even load.

It happened when I was running Diversion along with Skynet and RAM was just getting chalked despite the swap file.

In the end I removed Diversion and Skynet and the issue went away.
 
If you installed Entware's psmisc package you could use fuser to see if there is currently a process running that has that file open.

Code:
# fuser -v /var/nvram.lock
Specified filename /var/nvram.lock does not exist.

# fuser -v /tmp/syslog.log
                     USER        PID ACCESS COMMAND
/tmp/syslog.log:     admin      1122 F.... syslogd
 
i will give it a try when it hangs.

Just an idea - strace log shows /var/nvram.lock, but all other "lock" files are in /var/lock dir. Couldn't be there some raise condition when nvram ask for /var/nvram.lock but some other process asks for /var/lock/nvram.lock?
 
Router was stable after manual workaround for few days.
So I rebooted router and issue again reappear: here is a list of processes having nvram file open

Code:
COMMAND      PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
wlc_nt      1157 admin   10w   REG   0,13        0 1314 /var/nvram.lock
amas_lib    2172 admin   11wW  REG   0,13        0 1314 /var/nvram.lock
amas_lib    2172 admin   13wW  REG   0,13        0 1314 /var/nvram.lock
asusdisco  88212 admin    9w   REG   0,13        0 1314 /var/nvram.lock
httpd     111548 admin   12w   REG   0,13        0 1314 /var/nvram.lock

and is /var/lock/allwevent.lock open by cfg_server

So if you have an idea which one is causing the issue, please try to fix it. Or we will have always to manually workaround it
 
One more update
Killing amas_lib process unblock other processes and everything starts to work as expected - nvram, webui etc.
So the culprit is by my opinion amas_lib process - couldn't find anywhere what is it actually good for.
 
So the culprit is by my opinion amas_lib process - couldn't find anywhere what is it actually good for.

RMerlin said of that process:
This is the AiMesh service. If you ever used AiMesh on this router, best to do a factory default reset to remove any leftover configuration.

That library is closed source, so I have no idea what triggers a firewall restart from it.
Even though I've never used AiMesh on my standalone router the amas_lib process is still running.
 
Thanks for the troubleshooting, had the exact same problem. After killing the amas_lib process, webui came back.
 
amas_lib process is respawning so i had to add line like this to cron:
*/15 * * * * killall amas_lib

15 mins is perhaps bit aggressive as an issue usually reappears after few hours
since than my router don't get stuck, neither i got wifi reconnection problems
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top