What's new

High CPU usage and memory leak on 386.7_2 on RT-AC3100 on LLDPD process

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

unrealdude24

Occasional Visitor
I'm getting high CPU and memory usage suddenly.

Syslogs are filled with:
Code:
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory

Syslog show nothing out of the ordinary prior to not enough memory errors.

CPU usage alternating between each core spiking to 100% and then context switching to the other core. I've identified the process taking up most of the CPU cycles:
Code:
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  497   493 nobody   R     187m 37.1   1 49.0 lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100

Memory used is also over 80%

For some reason there are 2 processes of lldpd running simultaneously but under different users with one being the parent of the other

admin@RT-AC3100-AC28:/tmp/home/root# ps | grep lldp
493 admin 1484 S lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100
497 nobody 187m R lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100


Not sure if that is expected behavior.

The number of context switching happening on LLDP very high:
voluntary_ctxt_switches: 1977084
nonvoluntary_ctxt_switches: 266449700


I don't know how to generate a process dump to see the process stack and strace isn't part of the Asus OS. If there is anything I can do to help root cause the memory leak, let me know, I'd be happy to provide some logs or traces
 
I'm getting high CPU and memory usage suddenly.

Syslogs are filled with:
Code:
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory
Aug 17 01:07:57 lldpd[497]: Cannot allocate memory
Aug 17 01:07:57 lldpd[497]: not enough memory

Syslog show nothing out of the ordinary prior to not enough memory errors.

CPU usage alternating between each core spiking to 100% and then context switching to the other core. I've identified the process taking up most of the CPU cycles:
Code:
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  497   493 nobody   R     187m 37.1   1 49.0 lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100

Memory used is also over 80%
Well, you've certainly got a very odd situation with your router. The syslog messages are telling us that memory allocation requests are failing due to insufficient memory. However, you said that "Memory used is also over 80%" which is actually not a problem. In fact, it's fairly common for ASUS routers to reach 80% to 90% of RAM utilization at some point under normal operating conditions and still be OK because RAM is the kind of resource that's supposed to be used, not wasted by being constantly underutilized.

If your RAM usage is ~80%, it means that you still have about 20% of free RAM available. With 512MB RAM in your router, you have roughly ~100MB of free RAM (probably closer to ~90MB due to some reserved memory). That amount of free RAM is still sufficient under "normal" conditions to satisfy additional requests for memory allocations *unless* some program is trying to allocate an excessive amount that is more than the available free RAM. For example, if the "lldpd" process was indeed making a request for 100MB of memory at once, the OS may not be able to satisfy such a request & the memory allocation would fail.

Without further debugging information & more precise details of your router configuration, it's rather impossible to say at this point what the root of the problem may be, IMO.


For some reason there are 2 processes of lldpd running simultaneously but under different users with one being the parent of the other

admin@RT-AC3100-AC28:/tmp/home/root# ps | grep lldp
493 admin 1484 S lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100
497 nobody 187m R lldpd -L /usr/sbin/lldpcli -I vlan1,eth1,eth2,wds0.*,wds1.* -s RT-AC3100


Not sure if that is expected behavior.
AFAICT, that's normal. Other ASUS routers that I have checked so far show the same behavior. For comparison, here is a screenshot of the "lldpd" process from the RT-AC86U router:

RT-AC86U_LLDP_Info.jpg



The number of context switching happening on LLDP very high:
voluntary_ctxt_switches: 1977084
nonvoluntary_ctxt_switches: 266449700
The high number of "voluntary context switches" is most likely a symptom of the process getting frequent memory allocation failures and since it has nothing to do while it waits for the memory requests to be satisfied, the OS task scheduler terminates its time slice & gives CPU time to another process.

The high number of "non-voluntary context switches" is most likely another symptom of the process taking a lot of CPU to do its work and, therefore, exhausting its assigned CPU time slice so the OS task scheduler has to switch to another process to give CPU time.

IMO, the main question is: Why are the memory allocation failures occurring when there's still sufficient free RAM to satisfy reasonable requests? And the 2nd question is: Why is the "lldpd" process taking so much CPU time?


I don't know how to generate a process dump to see the process stack and strace isn't part of the Asus OS. If there is anything I can do to help root cause the memory leak, let me know, I'd be happy to provide some logs or traces
You can install the "strace" utility via Entware, which can be installed via AMTM (already part of the Asuswrt-Merlin f/w releases since a few years back).

Once you have installed the strace tool, you may be able to capture more debugging info with it.

Good Luck.
 
That amount of free RAM is still sufficient under "normal" conditions to satisfy additional requests for memory allocations *unless* some program is trying to allocate an excessive amount that is more than the available free RAM. For example, if the "lldpd" process was indeed making a request for 100MB of memory at once, the OS may not be able to satisfy such a request & the memory allocation would fail.
I suspect this is exactly what's happening. The process is already consuming 187MB of memory when it normally consumes less than 4MB.

@unrealdude24 While it might be a interesting academic exercise to try and diagnose the problem if this is a one-off I'd be tempted to just reboot the router (or kill the process) and move on with my life. If it reoccurs then I'd start investigating it.
 
I suspect this is exactly what's happening. The process is already consuming 187MB of memory when it normally consumes less than 4MB.

@unrealdude24 While it might be a interesting academic exercise to try and diagnose the problem if this is a one-off I'd be tempted to just reboot the router (or kill the process) and move on with my life. If it reoccurs then I'd start investigating it.
I took your advice and decided to just forget about it (mostly because the tools I need to dig deeper needed a restart). But this time I have entware and strace setup so the next time it happens I'll be able to collect some more info. Although now that I have a swap space, it might alter the behavior. We shall see
 

Similar threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top