What's new

Stuck commands

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

I don't recall seeing your script, but my impression is that Oracle/Somewhere script was not advertised to be prevent all hangs. And as I mentioned a few posts earlier, I'm not sure what I've done is even necessary for my needs. I'm just curious to see what works, what doesn't, and if any of these changes make a difference my AC86U functionality. Nevertheless, I'm curious to try your script. Can you point me to it?
I knew it was still around somewhere... Give this a go and see if it locks up on you. ;)

 
After a good amount of testing I have found a viable workaround for this problem and verified my results with @Viktor Jaep. This script is a concept script you guys can use to model your scripts after if you wish to implement the workaround.

Notes:
1. The script will ask if you want to check stuck PIDs or not, if you select Yes you should see the script run until it is complete with iterations. If you select No it should get stuck from a PID having a netlink socket error.
2. NVRAM get commands are called in the background with the nvramcheck function called in the foreground.
3. The NVRAM check function looks for the PID of the nvram get command to still be alive and if it is still active it will kill the PID, this is due to the PID being stuck from a netlink socket error.
4. The loop that handles assigning the variables has built in error handling and will reset the value back to the previous successful value if the nvram command comes back null (PID being stuck and killed) so you never have a null value returned.

 
After a good amount of testing I have found a viable workaround for this problem and verified my results with @Viktor Jaep. This script is a concept script you guys can use to model your scripts after if you wish to implement the workaround.

Notes:
1. NVRAM get commands are called in the background with the nvramcheck function called in the foreground.
2. The NVRAM check function looks for the PID of the nvram get command to still be alive and if it is still active it will kill the PID, this is due to the PID being stuck from a netlink socket error.
3. The loop that handles assigning the variables has built in error handling and will reset the value back to the previous successful value if the nvram command comes back null (PID being stuck and killed) so you never have a null value returned.

It's right on the money... able to intercept NVRAM lockups on our pesky AC86U's (and variations thereof) and able to get past them... congrats @Ranger802004... you did it! :)
 
It's right on the money... able to intercept NVRAM lockups on our pesky AC86U's (and variations thereof) and able to get past them... congrats @Ranger802004... you did it! :)

Interestingly, I have not seen a single stuck nvram call with my add-on-free Merlin FW setup. So I suspect that the stuck nvram is related to running scripts or some add-on items.
 
Interestingly, I have not seen a single stuck nvram call with my add-on-free Merlin FW setup. So I suspect that the stuck nvram is related to running scripts or some add-on items.
I think they happen even with the factory FW and it causes some things to hang up that you may not notice that the router does on the back end (IIRC). It's not caused by the script themselves but by nvram get commands in general which most of our scripts use.
 
After a good amount of testing I have found a viable workaround for this problem and verified my results with @Viktor Jaep. This script is a concept script you guys can use to model your scripts after if you wish to implement the workaround.

Notes:
1. The script will ask if you want to check stuck PIDs or not, if you select Yes you should see the script run until it is complete with iterations. If you select No it should get stuck from a PID having a netlink socket error.
2. NVRAM get commands are called in the background with the nvramcheck function called in the foreground.
3. The NVRAM check function looks for the PID of the nvram get command to still be alive and if it is still active it will kill the PID, this is due to the PID being stuck from a netlink socket error.
4. The loop that handles assigning the variables has built in error handling and will reset the value back to the previous successful value if the nvram command comes back null (PID being stuck and killed) so you never have a null value returned.


Does this run continuously? If yes, I'd be interested to see a version for stuck wl commands which I do get with my router. The Oracle/SomewhereOverTheRainbow script is working fine for me, but I'm interested in seeing others.
 
Does this run continuously? If yes, I'd be interested to see a version for stuck wl commands which I do get with my router. The Oracle/SomewhereOverTheRainbow script is working fine for me, but I'm interested in seeing others.
This script is just a concept script for the Devs, it can allow them to build in error handling for the stuck commands. You can run the script with and without the NVRAM checks to see the different behaviors on your router. You can always go in and kill the stuck PID and that will allow a script to resume operation as well but this would be a built in automated workaround.
 
I think they happen even with the factory FW and it causes some things to hang up that you may not notice that the router does on the back end (IIRC). It's not caused by the script themselves but by nvram get commands in general which most of our scripts use.

I've been monitoring for stuck commands for at least three weeks now. I only see occasional stuck wl. No stuck nvram calls.
 
This script is just a concept script for the Devs, it can allow them to build in error handling for the stuck commands. You can run the script with and without the NVRAM checks to see the different behaviors on your router. You can always go in and kill the stuck PID and that will allow a script to resume operation as well but this would be a built in automated workaround.

ok, so not meant for end user.
 
I've been monitoring for stuck commands for at least three weeks now. I only see occasional stuck wl. No stuck nvram calls.
What model router do you have? I think other models are more effected than most. I have a test GT-AC2900 which is essentially the AC86U and it has the stuck nvram commands constantly and I tested this script with it and it made it all of the way to 100,000 loops and completed.
 
ok, so not meant for end user.
Correct, I'm hoping the other Devs will be able to incorporate the concepts of this script to help prevent them from being locked up. This would be a huge improvement for everyone using script add ons.
 
What model router do you have? I think other models are more effected than most. I have a test GT-AC2900 which is essentially the AC86U and it has the stuck nvram commands constantly and I tested this script with it and it made it all of the way to 100,000 loops and completed.

AC86U which, to my knowledge, is the only router exhibiting these issues.
 
AC86U which, to my knowledge, is the only router exhibiting these issues.
The GT-AC5300 router also exhibits the same "stuck process" issues for both nvram & wl commands. See the following post:


Note that under normal daily operations, the "stuck process" events are not very frequent on the GT-AC5300 (average of once every 2 weeks or so). The router is running the OEM stock F/W, so no 3rd-party add-ons or services, and no built-in extra services (i.e. AiProtection, Traffic Analyzer/Traffic Monitor, Parental Controls, QoS, AiCloud, AiDisk, etc.).
 
The GT-AC5300 router also exhibits the same "stuck process" issues for both nvram & wl commands. See the following post:


Note that under normal daily operations, the "stuck process" events are not very frequent on the GT-AC5300 (average of once every 2 weeks or so). The router is running the OEM stock F/W, so no 3rd-party add-ons or services, and no built-in extra services (i.e. AiProtection, Traffic Analyzer/Traffic Monitor, Parental Controls, QoS, AiCloud, AiDisk, etc.).
I had a GT-AC5300 in the past and it did have random hang ups that I never looked deep into and that probably was it. My test GT-AC2900 also has the problem so it's not just the AC86U.
 
I had a GT-AC5300 in the past and it did have random hang ups that I never looked deep into and that probably was it. My test GT-AC2900 also has the problem so it's not just the AC86U.
Yeah, we (the owner of the GT router & I) occasionally noticed some strange issues, especially when logged into the WebGUI, but the network operations did not seem affected at all (at least it wasn't obvious) so we never really looked into that either. It wasn't until other users started to report "stuck processes" on their routers, that I wrote the script and ran it as a cron job to check and, sure enough, the problem was there as well.
 
Yeah, we (the owner of the GT router & I) occasionally noticed some strange issues, especially when logged into the WebGUI, but the network operations did not seem affected at all (at least it wasn't obvious) so we never really looked into that either. It wasn't until other users started to report "stuck processes" on their routers, that I wrote the script and ran it as a cron job to check and, sure enough, the problem was there as well.
Yea and I found a native solution to solve the problems for our scripts to do the error handling so I think as a community we have overcome this issue.
 
Yea and I found a native solution to solve the problems for our scripts to do the error handling so I think as a community we have overcome this issue.
Earlier this evening I tried to run your script but immediately got 2 errors about "parameter not set" for RED and then NOCOLOR vars, so I ended up deleting those variables from the script. Then I ran it again and the script got stuck in an infinite loop so I had to kill the "nvramcheck.sh" process every time (regardless of whether I answer Y or N to the initial question). I briefly debugged it and found that your script assumes that every single "nvram get ..." call will return a non-NULL/non-empty result, but this is an incorrect assumption.

In practice, sometimes there are calls to NVRAM vars that are not set all, or to NVRAM vars that are set to an empty string. In each instance, the "nvram get ..." call must be handled appropriately, but your current script ends up in an infinite loop in all such cases.

There are also some strange (to me) syntax forms, and I don't understand their purpose. I'm not talking about differences in coding styles, but about some syntactic statements that within their particular context appear to be completely unnecessary, IMO.

For example:
Bash:
case $yn in
  [Yy]* ) CHECKNVRAM="1" && break;;
  [Nn]* ) CHECKNVRAM="0" && break;;
  * ) echo -e "${RED}Invalid Selection!!! ***Enter Y for Yes or N for No***${NOCOLOR}"
esac
I don't see the purpose of the two "&& break" expressions above, especially since the entire case statement is *not* found within the scope of any type of loop statement (e.g. while, until, for, etc.). Yes, those "break" cmds don't "hurt or break anything" but seem superfluous in the above context.

There are other examples of "strange" syntactic forms, but I don't want to minimize your efforts or appear to be simply criticizing your work. Perhaps there is indeed a purpose to those strange statements that I'm not aware of.

In any case, the current version of your script just doesn't run successfully on my RT-AC86U router, so your solution is not quite there yet, IMO.

My 2 cents.
 
Earlier this evening I tried to run your script but immediately got 2 errors about "parameter not set" for RED and then NOCOLOR vars, so I ended up deleting those variables from the script. Then I ran it again and the script got stuck in an infinite loop so I had to kill the "nvramcheck.sh" process every time (regardless of whether I answer Y or N to the initial question). I briefly debugged it and found that your script assumes that every single "nvram get ..." call will return a non-NULL/non-empty result, but this is an incorrect assumption.

In practice, sometimes there are calls to NVRAM vars that are not set all, or to NVRAM vars that are set to an empty string. In each instance, the "nvram get ..." call must be handled appropriately, but your current script ends up in an infinite loop in all such cases.

There are also some strange (to me) syntax forms, and I don't understand their purpose. I'm not talking about differences in coding styles, but about some syntactic statements that within their particular context appear to be completely unnecessary, IMO.

For example:
Bash:
case $yn in
  [Yy]* ) CHECKNVRAM="1" && break;;
  [Nn]* ) CHECKNVRAM="0" && break;;
  * ) echo -e "${RED}Invalid Selection!!! ***Enter Y for Yes or N for No***${NOCOLOR}"
esac
I don't see the purpose of the two "&& break" expressions above, especially since the entire case statement is *not* found within the scope of any type of loop statement (e.g. while, until, for, etc.). Yes, those "break" cmds don't "hurt or break anything" but seem superfluous in the above context.

There are other examples of "strange" syntactic forms, but I don't want to minimize your efforts or appear to be simply criticizing your work. Perhaps there is indeed a purpose to those strange statements that I'm not aware of.

In any case, the current version of your script just doesn't run successfully on my RT-AC86U router, so your solution is not quite there yet, IMO.

My 2 cents.
The script is just a concept and that piece was borrowed from another script where that was in a while loop, I think you're splitting hairs on the wrong thing here. The script is not meant to be used for production, it's just for concepts on how to deal with NVRAM calls within a script. But to make you feel better I cleaned up that statement, it doesn't hurt anything as long as someone actually selects Yes or No. I've had multiple runs on the AC86U so maybe there's another factor at play with your router, I tested it with another in here and in my personal test it made it to 100k iterations and completed as expected while outputting total NVRAM PIDs it had to kill. :) As far as null NVRAM values, in an actual script someone can build their if statements around that, this is just a pure example of implementation and not meant for someone to go use directly.
 
The script is just a concept and that piece was borrowed from another script where that was in a while loop, I think you're splitting hairs on the wrong thing here. The script is not meant to be used for production, it's just for concepts on how to deal with NVRAM calls within a script. But to make you feel better I cleaned up that statement, it doesn't hurt anything as long as someone actually selects Yes or No. I've had multiple runs on the AC86U so maybe there's another factor at play with your router, I tested it with another in here and in my personal test it made it to 100k iterations and completed as expected while outputting total NVRAM PIDs it had to kill. :) As far as null NVRAM values, in an actual script someone can build their if statements around that, this is just a pure example of implementation and not meant for someone to go use directly.

I didn't run or use your script for "production." I simply tried to run it on my router to see how it works as a "proof of concept" (which I'm very familiar with as a professional s/w dev. myself). The point was that the current version of the script goes into an infinite loop because some NVRAM vars are set to empty strings or not set at all, which are not uncommon scenarios on ASUS routers. A "proof of concept" demonstration should take care of common scenarios like empty values; it doesn't have to be completely foolproof, but it shouldn't go into an infinite loop either.

Look, I get it. Nobody likes criticism, and some people are more averse to it than others even when it's constructive, as my feedback was meant to be. In one way or another, we're here to learn and if you are, I can offer some advice. If not, I can certainly move on - I got no skin in this game.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top