Multiple Log Copy processes in Router / Causing Major Instability

  • ATTENTION! As of November 1, 2020, you are not able to reply to threads 6 months after the thread is opened if there are more than 500 posts in the thread.
    Threads will not be locked, so posts may still be edited by their authors.
    Just start a new thread on the topic to post if you get an error message when trying to reply to a thread.

Rocketboy235

Occasional Visitor
Noticed that one of my routers was listing these processes when I did the ps command:

Code:
   PID USER  VSZ  STAT      COMMAND
 3520 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 4646 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 5072 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 5481 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 5913 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 6311 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 6742 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 7741 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 8172 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 8623 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 9126 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
 9592 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
10016 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
10417 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
10862 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
11288 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
11757 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
12340 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
12884 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
13372 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
13922 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
14423 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
14846 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
15260 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
15675 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs
16176 Root    1424 D    cp /tmp/syslog.log /tmp/syslog.log-1 /jffs

And there were plenty more. My other merlin router did not have this and I'm not sure. From the look of things, it appears it's trying to copy the current log to syslog-1 but not sure why /jffs is there also unless it's trying to copy to there as well.

I wonder what option/function is responsible for this.
 

ColinTaylor

Part of the Furniture
The router will periodically copy the log file from /tmp to /jffs. That's normal. The question is why it appears to be hanging causing the build up of subsequent runs. Is your /jffs partition full up?
Code:
 df -ah


I wonder what option/function is responsible for this.
watchdog
 

Rocketboy235

Occasional Visitor
The router will periodically copy the log file from /tmp to /jffs. That's normal. The question is why it appears to be hanging causing the build up of subsequent runs. Is your /jffs partition full up?
Code:
 df -ah


watchdog

Thanks.

Had to restart my router since it was giving troubles but currently, it is not full for /dev/mtdblock4
upload_2020-5-7_2-56-36.png


If I see those processes appear again though, I'll have to check the space usage (no longer see the multiple cp processes after restarting)

Currently, my logging is set to the following:
upload_2020-5-7_3-1-35.png


and was also reading in another thread that firewall logging can be intensive so here are my firewall logging settings as well:

upload_2020-5-7_3-2-2.png


Not sure if I should tweak those logging settings though if they play any role in the erratic behavior I'm seeing from my router earlier.

I do have this router logs pulled to another service for remote access/storage and I noticed it generates roughly 10-20 MB each day (have 30 clients and a few people in the house who use the network). That seems excessive to me so I might need to figure out which logging settings I need to cut (and this could also be why its slowing my router's performance).

Also, a somewhat random question, if I currently turned off IPv6 for my router, should I just turn off the IPv6 firewall?
 

Jack Yaz

Part of the Furniture
Thanks.

Had to restart my router since it was giving troubles but currently, it is not full for /dev/mtdblock4
View attachment 23299

If I see those processes appear again though, I'll have to check the space usage (no longer see the multiple cp processes after restarting)

Currently, my logging is set to the following:
View attachment 23300

and was also reading in another thread that firewall logging can be intensive so here are my firewall logging settings as well:

View attachment 23301

Not sure if I should tweak those logging settings though if they play any role in the erratic behavior I'm seeing from my router earlier.

I do have this router logs pulled to another service for remote access/storage and I noticed it generates roughly 10-20 MB each day (have 30 clients and a few people in the house who use the network). That seems excessive to me so I might need to figure out which logging settings I need to cut (and this could also be why its slowing my router's performance).

Also, a somewhat random question, if I currently turned off IPv6 for my router, should I just turn off the IPv6 firewall?
Set logged packets to dropped only. Logging accepted packets will fill your log at a ridiculous rate
 

SomeWhereOverTheRainBow

Very Senior Member
Thanks.

Had to restart my router since it was giving troubles but currently, it is not full for /dev/mtdblock4
View attachment 23299

If I see those processes appear again though, I'll have to check the space usage (no longer see the multiple cp processes after restarting)

Currently, my logging is set to the following:
View attachment 23300

and was also reading in another thread that firewall logging can be intensive so here are my firewall logging settings as well:

View attachment 23301

Not sure if I should tweak those logging settings though if they play any role in the erratic behavior I'm seeing from my router earlier.

I do have this router logs pulled to another service for remote access/storage and I noticed it generates roughly 10-20 MB each day (have 30 clients and a few people in the house who use the network). That seems excessive to me so I might need to figure out which logging settings I need to cut (and this could also be why its slowing my router's performance).

Also, a somewhat random question, if I currently turned off IPv6 for my router, should I just turn off the IPv6 firewall?
upload_2020-5-7_4-41-12.png


per jacks suggestion
 

ColinTaylor

Part of the Furniture
Set logged packets to dropped only. Logging accepted packets will fill your log at a ridiculous rate
Personally I would recommend that you set logged packets to none. I see no value in logging the background noise of the internet. So unless you have a particular reason to do so you're just filling up the log file for no purpose.
 

Rocketboy235

Occasional Visitor
Thanks everyone! Will set it to dropped first and then set it to None (more of just curious how much space is taken up for each option). After looking in my logs, I do see a lot of "DROP" though and hardly any "ACCEPT" when it was set to log both types previously.

Anyway, fortunately when my router was being unresponsive (at least accessing the webui but VPN and PuTTY is still fine (at least until a few days later were those will eventually stop working), it was still transmitting the logs to the cloud.

At the time of the errors shown below that was when PuTTY was being sluggish and eventually the OpenVPN server would no longer work.
Looks like I'm getting a lot of watchdog restarts and errors about no space left on device. The watchdog restarts happen pretty much every minute with the no space left issue also happening almost every minute. I haven't had a chance to do a full factory reset of my router yet (hard to do since I'm currently off site) but I'm at least glad I'm starting to reach the root cause of why my router may becoming unresponsive after a few days or a few weeks (not always predictable on when it starts to fail before having to do a hard reboot).

Here are some snippets from the log (search filtered the log beforehand so not exactly a direct copy/paste so these these errors in both snippets are always constantly happening and are intertwined):

Code:
May 05 14:40:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:41:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:41:22 RT-AC68U watchdog Error locking /var/lock/cfebuf.lock: 28 No space left on device
May 05 14:41:54 RT-AC68U dnsmasq-script mmap of a spare page failed!
May 05 14:41:54 RT-AC68U dnsmasq-script mmap of a spare page failed!
May 05 14:42:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:43:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:43:22 RT-AC68U watchdog Error locking /var/lock/cfebuf.lock: 28 No space left on device
May 05 14:44:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:44:22 RT-AC68U watchdog Error locking /var/lock/cfebuf.lock: 28 No space left on device
May 05 14:45:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:46:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:47:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:48:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:49:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:50:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:51:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:52:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)
May 05 14:53:02 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)

Code:
May 05 10:45:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:45:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:46:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:46:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:47:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:47:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:48:23 RT-AC68U rc_service watchdog 12377:notify_rc start_cfgsync
May 05 10:48:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:48:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:49:23 RT-AC68U rc_service watchdog 12381:notify_rc start_cfgsync
May 05 10:49:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:49:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:50:23 RT-AC68U rc_service watchdog 12382:notify_rc start_cfgsync
May 05 10:50:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:50:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:51:23 RT-AC68U rc_service watchdog 12385:notify_rc start_cfgsync
May 05 10:51:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:51:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:52:23 RT-AC68U rc_service watchdog 12389:notify_rc start_cfgsync
May 05 10:52:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:52:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:53:23 RT-AC68U rc_service watchdog 12392:notify_rc start_cfgsync
May 05 10:53:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:53:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:54:23 RT-AC68U rc_service watchdog 12393:notify_rc start_cfgsync
May 05 10:54:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
May 05 10:54:52 RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
May 05 10:55:23 RT-AC68U rc_service watchdog 12396:notify_rc start_cfgsync
May 05 10:55:52 RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
 
Last edited:

Rocketboy235

Occasional Visitor
Also noticed that when I looked around these times when the errors started occuring, I do see this pop up frequently:

(substituted the following numbers for real MACs: 00:00:00:00:00:00 is my AiMesh Node and 11:11:11:11:11:11 is one of the connected devices)

Code:
May 05 16:46:27 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(401): eth1: Disassoc 00:00:00:00:00:00, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8)
May 05 16:46:27 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(420): eth1: Auth 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)
May 05 16:46:27 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(449): eth1: Assoc 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)
-----
May 05 16:47:35 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(386): eth1: Deauth_ind 00:00:00:00:00:00, status: 0, reason: Deauthenticated because sending station is leaving (or has left) IBSS or ESS (3)
May 05 16:47:37 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(420): eth1: Auth 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)
May 05 16:47:37 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(449): eth1: Assoc 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)

May 05 16:47:37 RT-AC68U dnsmasq-dhcp failed to write /var/lib/misc/dnsmasq.leases: No space left on device (retry in 60s)

May 05 16:47:38 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(386): eth2: Deauth_ind 11:11:11:11:11:11, status: 0, reason: Deauthenticated because sending station is leaving (or has left) IBSS or ESS (3)
May 05 16:47:39 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(401): eth1: Disassoc 00:00:00:00:00:00, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8)
May 05 16:47:39 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(420): eth1: Auth 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)
May 05 16:47:39 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(449): eth1: Assoc 00:00:00:00:00:00, status: 0, reason: d11 RC reserved (0)
May 05 16:47:41 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(420): eth2: Auth 11:11:11:11:11:11, status: 0, reason: d11 RC reserved (0)
May 05 16:47:41 RT-AC68U syslog WLCEVENTD wlceventd_proc_event(449): eth2: Assoc 11:11:11:11:11:11, status: 0, reason: d11 RC reserved (0)

Seeing the disassoc and auth and assoc very frequently for AiMesh node so that is very suspicious.

For some context into why I somehow picked up on this issue in my first post, it was related to this thread: https://www.snbforums.com/threads/t...-access-web-gui-ssh-issues.61887/#post-580716

Edit: provided wrong thread link above so I had to fix that
 
Last edited:

Rocketboy235

Occasional Visitor
So I started paying a bit more attention to the router's logs and it looks like the multiple cp processes came back. JFFS does not appear to be full yet but usage went from 20% to 21% for the /jffs. Didn't see them yesterday but see them today. Also, noticed in the logs that the watchdog notifications came back.

Code:
RT-AC68U check_watchdog [check_watchdog] restart watchdog for no heartbeat
RT-AC68U rc_service check_watchdog 269:notify_rc restart_watchdog
syslog: WLCEVENTD wlceventd_proc_event(401): eth1: Disassoc 00:00:00:00:00:00, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8)
syslog: WLCEVENTD wlceventd_proc_event(401): eth2: Disassoc 00:00:00:00:00:00, status: 0, reason: Disassociated because sending station is leaving (or has left) BSS (8)

----
where 00:00:00:00:00:00 is the AiMesh node MAC

came back a little bit after midnight and started spamming my logs every minute. This may or may not be related to the reason why I am unable to see any traffic data via the statistic webapp since midnight of 5/8:

<see attached>


I noticed in the past, that whenever the router stops responding when I try to access the management features, it also stop recording data until I do a hard reset.

One other strange behavior I see today is my AiMesh node is showing up as offline through the web management UI though I can still SSH into it with no problem. I do see in the primary router logs that it is doing this every minute to my AiMesh node:

Code:
May 08 10:44:38 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPACK(br0) 192.168.1.157 00:00:00:00:00:00 RT-AC68U-AIMESH_NODE
May 08 10:45:08 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPREQUEST(br0) 192.168.1.157 00:00:00:00:00:00
May 08 10:45:08 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPACK(br0) 192.168.1.157 00:00:00:00:00:00 RT-AC68U-AIMESH_NODE
May 08 10:45:38 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPREQUEST(br0) 192.168.1.157 00:00:00:00:00:00
May 08 10:45:38 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPACK(br0) 192.168.1.157 00:00:00:00:00:00 RT-AC68U-AIMESH_NODE
May 08 10:46:18 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPREQUEST(br0) 192.168.1.157 00:00:00:00:00:00
May 08 10:46:18 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPACK(br0) 192.168.1.157 00:00:00:00:00:00 RT-AC68U-AIMESH_NODE
May 08 10:46:38 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPREQUEST(br0) 192.168.1.157 00:00:00:00:00:00
May 08 10:46:38 RT-AC68U-PRIMARY dnsmasq-dhcp DHCPACK(br0) 192.168.1.157 00:00:00:00:00:00 RT-AC68U-AIMESH_NODE

If anyone is curious on what kind of impact this has on my RAM and CPU usage for the router (and also temperatures), here is the data (and it's very frightening to watch...):

<See attached>

I thought it was RAM usage that was slowing down my router and causing it to be unresponsive but it looks like it's actually the CPU usage that is affected.
 

Attachments

  • analyzer-unable-record-data.PNG
    analyzer-unable-record-data.PNG
    239.1 KB · Views: 112
  • cpu-usage.PNG
    cpu-usage.PNG
    22.6 KB · Views: 113
  • horrible-temps-cpu.png
    horrible-temps-cpu.png
    21.7 KB · Views: 119
Last edited:

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top