Watchdog crash problem

  • ATTENTION! As of November 1, 2020, you are not able to reply to threads 6 months after the thread is opened if there are more than 500 posts in the thread.
    Threads will not be locked, so posts may still be edited by their authors.
    Just start a new thread on the topic to post if you get an error message when trying to reply to a thread.

DocUmibozu

Occasional Visitor
Hello,
I'm experiencing a strange problem in the last few days.
Every 2-3 days the router web interface and ssh shell hang up.
I mean that logging in in the web interface is impossibile. Logging in via ssh is possibile, but if I try to issue any command it hangs-up.
Only solution is to turn off the router manually, using the switch.
I've inspected the log and I can see that it's everything normal until the log starts to populate with this error:

Feb 7 15:17:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 15:17:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

this is the only log message before I switch the router off, and it's repeated every minute.
Any idea?
Thanks to all
 

dave14305

Part of the Furniture
Which firmware version?
 

L&LD

Part of the Furniture
Was this router ever fully reset to factory defaults after flashing the RMerlin firmware? Without using a saved backup config file to configure it?

If not, please see the link in my signature below to get your router back to a good/known configuration.

I have installed a few RT-AC66U_B1 routers for customers and I haven't seen this issue with any of them.
 

DocUmibozu

Occasional Visitor
Update:

the complete log of the error is this:

Feb 7 16:13:38 RT-AC66U_B1 custom_script: Running /jffs/scripts/service-event (args: restart watchdog)
Feb 7 16:21:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 16:21:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

So it seems that the crash is started from /jffs/script/service-event.
In this script there's one call to uiscribe and one call to Skynet.
 

dave14305

Part of the Furniture
Update:

the complete log of the error is this:

Feb 7 16:13:38 RT-AC66U_B1 custom_script: Running /jffs/scripts/service-event (args: restart watchdog)
Feb 7 16:21:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 16:21:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

So it seems that the crash is started from /jffs/script/service-event.
In this script there's one call to uiscribe and one call to Skynet.
No, the service-event is the last step from the previous restart. It runs when any service is restarted/started/stopped.
 

DocUmibozu

Occasional Visitor
Yes, you are right, the sequence seems to be:

  1. something goes wrong;
  2. watchdog comes into play and starts service-event;
  3. the loop continues because the problem in point n.1 isn't solved.
I'll do some more investigation.
Thanks to you all, I'll be in touch
 

saccleo

Occasional Visitor
Yes, you are right, the sequence seems to be:

  1. something goes wrong;
  2. watchdog comes into play and starts service-event;
  3. the loop continues because the problem in point n.1 isn't solved.
I'll do some more investigation.
Thanks to you all, I'll be in touch
Is there any method to solve the problem? i have met the same problem.
same router model and same error log, for every about 3 days webui and ssl can not be accessed.
i have upgrade to the newest firmware and start from blank flash.
 
Last edited:

DocUmibozu

Occasional Visitor
Hi,
go to administration/system. Then under network monitoring select ping instead of dns query.
For me this solved the problem. Before the uptime was 4-6 days, now my router is running for 14 days and the glitch hasn't showed again.
 

ech

Regular Contributor
I had this happen yesterday as well - loads of the watchdog restarts getting logged. I was able to run some commands though, and see that any attempt to access /jffs was hanging... so there were a lot of "cp .../tmp/syslog... /jffs" processes running and hung.

reboot didn't work - or not cleanly, and I had to power-cycle to get the device (RT-AC68U running 384.15) running again. It had been up about 10 days.

Also, I don't have either ping or dns network monitoring enabled.
 

saccleo

Occasional Visitor
Hi,
go to administration/system. Then under network monitoring select ping instead of dns query.
For me this solved the problem. Before the uptime was 4-6 days, now my router is running for 14 days and the glitch hasn't showed again.
i got some help from offcial asus support which need to some change for wireless setting for 2.4g and 5g, use n mode instead of auto or legacy.
 

DocUmibozu

Occasional Visitor
i got some help from offcial asus support which need to some change for wireless setting for 2.4g and 5g, use n mode instead of auto or legacy.

Consider that my router has wifi disabled.... I don't think Asus support gave you a correct answer....
 

saccleo

Occasional Visitor
Consider that my router has wifi disabled.... I don't think Asus support gave you a correct answer....
maybe, i have check my config, ping and dns query under adminstration/system are both uncheck.
Because when it happened i could not login via ssh, no more informations could be found.
 

saccleo

Occasional Visitor
I had this happen yesterday as well - loads of the watchdog restarts getting logged. I was able to run some commands though, and see that any attempt to access /jffs was hanging... so there were a lot of "cp .../tmp/syslog... /jffs" processes running and hung.

reboot didn't work - or not cleanly, and I had to power-cycle to get the device (RT-AC68U running 384.15) running again. It had been up about 10 days.

Also, I don't have either ping or dns network monitoring enabled.

which command that you used to find the crash reason?
ps, or top?
 

ech

Regular Contributor
which command that you used to find the crash reason?
ps, or top?
Didn't see a crash - just a hang. And "ps" showed a huge number of those "cp .../tmp/syslog... /jffs" commands running.

I've reformatted /jffs (backed up jffs, selected the reformat on next reboot option, rebooted, then restored /jffs, then rebooted again) and haven't had the problem since... though I had only seen this on this one occasion as well, so will have to see if it happens again or not.
 

bengalih

Regular Contributor
I believe I'm seeing the same problem here. On a RT-AC68U running 384.15.
This morning I was able to log into SSH, but after getting the MOTD/banner I did not get a command prompt.
I was also unable to get the WebUI login.

I downloaded my syslog via plink (that worked) and apart from a bunch of kernel block messages from skynet and what look like normal dropbear logins I have only the following:

Code:
    Line 332: Mar  5 07:53:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 332: Mar  5 07:53:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 333: Mar  5 07:53:07 rc_service: check_watchdog 308:notify_rc restart_watchdog
    Line 848: Mar  5 08:00:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 848: Mar  5 08:00:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 849: Mar  5 08:00:07 rc_service: check_watchdog 308:notify_rc restart_watchdog

My first login attempt this morning was 7:58, so one of those watchdog checks is prior to my login and one is after. It's worth noting that the syslog only contains data from 7:49 am, I'm not sure if this is due to the number of logged entries from skynet/kernel (total log is about 1100 lines).

I issued a service restart_httpd via plink and immediately the web interface login became available. I logged on to the webUI, although I can't say that I actually got the interface because as I waited for it to load I toggled away and also issued a service restart_sshd. When I did that I lost my putty connection (the one where I didn't get the command prompt), and it also appeared that I lost my webui. After that I can no longer access the webUI nor SSH at all, so the restart seemed to not restart but totally kill both services.

My device was fully reset (100% jffs reformated, nvram cleared, new firmware flashed, etc) last week when I upgraded to .15 so I can't see how that is related. I have been running for years on prior versions and never experienced this before.

That being said, on my latest build I did allow AMTM to do all my commands regarding usb drive formatting and swapfiles, etc where in the past I had done those myself. I also haven't been running Skynet very long, I ran it for maybe a week prior to my latest upgrade and fresh format.
Apart from that and some other minor customization I can't think of anything intrusive enough that should be causing this.
 

bengalih

Regular Contributor
As an additional follow up I rebooted my router today after several days of it being in the above state.
While my WiFi and Internet had continued to work in the state it was in, after reboot I see that most other functions failed to operate.
For instance, no traffic was captured for use Traffic Analyzer.
My schedule cru/cron file backup job had not run.

So far this was a one time deal, if it continues to happen I will begin rollback to some of my configuration settings that I had prior to my last firmware update and full router reset. Specifically I will reformat using EXT3 instead of EXT4, set my swap to 4GB instead of 2GB and eliminate Skynet. While most of these should not have bearing on this issue, those are really the only new variables since my rebuild apart from the actual firmware itself.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top