1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice

Welcome To SNBForums

SNBForums is a community for anyone who wants to learn about or discuss the latest in wireless routers, network storage and the ins and outs of building and maintaining a small network.

If you'd like to post a question, simply register and have at it!

While you're at it, please check out SmallNetBuilder for product reviews and our famous Router Charts, Ranker and plenty more!

High CPU only when no WAN connection

Discussion in 'Asuswrt-Merlin' started by BSOD2600, Jan 13, 2020.

  1. BSOD2600

    BSOD2600 Occasional Visitor

    Joined:
    Jan 29, 2015
    Messages:
    49
    RT-AC87U 384.13_2
    Diversion 4.1.8
    amtm 3.0


    Been having this problem for several merlin and diversion versions, so don't think its specifically related to their latest releases.

    Basically, when the cable modem (WAN) looses connection, the 2 CPUs on the router are pegged at 100%. This prevents clients on the LAN from even able to obtain a DHCP address. Most of the time, the router is so resource constrained that I'm unable to issue a TOP command after SSH into it. Even when the WAN connection is restored, the router is locked up and is unable to provide routing to the LAN clients until its been power cycled.

    Today, I was finally able to capture this behavior. top always showed the tdts_rule_agent process, and cycled between dnsmasq and mtdblock3 as the 2 other top offenders. here is a snapshot before/after disabling diversion, while the WAN is disconnected.

    Code:
    DIVERSION ENABLED
    
    Mem: 194012K used, 61664K free, 3552K shrd, 1888K buff, 26700K cached
    CPU: 47.1% usr 52.6% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq  0.1% sirq
    Load average: 2.75 2.45 1.77 4/114 5437
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
      778     1 nobody   R    51068 19.9   0 46.1 dnsmasq --log-async
     5430  5334 admin    R     1732  0.6   1 44.6 tdts_rule_agent -g -r /jffs/signature/rule.trf
      192     1 admin    R     5132  2.0   1  5.0 nt_center
      169     1 admin    S      652  0.2   0  1.4 tftpd
    <SNIP>
    
    Mem: 196496K used, 59180K free, 1708K shrd, 1200K buff, 9152K cached
    CPU: 49.8% usr 46.2% sys  0.0% nic  3.5% idle  0.0% io  0.0% irq  0.3% sirq
    Load average: 2.84 2.39 1.34 3/114 6274
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
      794     1 nobody   R    81292 31.7   0 43.0 dnsmasq --log-async
       27     2 admin    RW       0  0.0   1 22.1 [mtdblock3]
      196     1 admin    S     5144  2.0   0  2.5 nt_center
      179     1 admin    S     2604  1.0   0  2.4 protect_srv
    <SNIP>
    
    [email protected]:/tmp/home/root# iostat
    Linux 2.6.36.4brcmarm (RT-AC87U)        01/14/20        _armv7l_        (2 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               1.09    0.00    2.12    0.06    0.00   96.72
    
    Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    mtdblock3         0.12         4.16         0.00     249594          0
    sda               0.03         4.64         0.11     277944       6344
    
    [email protected]:/tmp/home/root# dstat
    Traceback (most recent call last):
      File "/opt/bin/dstat", line 32, in <module>
        import six
    ImportError: No module named six
    
    
    DIVERSION DISABLED
    Mem: 129604K used, 126072K free, 3512K shrd, 1048K buff, 9620K cached
    CPU:  1.5% usr 52.2% sys  0.0% nic 46.0% idle  0.0% io  0.0% irq  0.0% sirq
    Load average: 2.63 2.48 1.81 2/115 6174
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     6136  5699 admin    R     1732  0.6   1 49.3 tdts_rule_agent -g -r /jffs/signature/rule.trf
      192     1 admin    R     5132  2.0   1  1.4 nt_center
      169     1 admin    S      652  0.2   1  0.9 tftpd
      179     1 admin    R     2604  1.0   1  0.6 protect_srv
    <SNIP>
    
    About 2 weeks ago, I migrated the diversion use case to a dedicated pi-hole device (partially because of this problem on a hunch). When the symptom was occurring today, I disabled diversion (and pixelserve), which then showed the CPUs only going between 0-50%. tdts_rule_agent was still always top of the list, but at least clients were able to get DHCP leases.

    I also capture the RT-AC87U syslogs with Splunk. Looking back over the time of the incident, the only thing logged are a constant repeat of dnsmasq listing the nameservers and host addresses.
    RT-AC87U - dnsmasq loop.png


    Anyone have ideas what is the root cause for this problem?
     
  2. BSOD2600

    BSOD2600 Occasional Visitor

    Joined:
    Jan 29, 2015
    Messages:
    49
    No one has any ideas?