What's new

Router lockup: "can't fork" on command line

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

mystical

Occasional Visitor
I have been running 380.58 for almost a month on 3 different AC-68Us with excellent results.
However this week my 'primary' router started to always kick me back to the login screen.
This morning I could ssh to the router but only execute busybox commands.
A power cycle fixed it, of course, but I worry that if/when this lockup happens next, I won't be able to recover unless I am at home.

Rob

=============

Robs-MBP:~ rob$ ssh 192.168.1.1
ASUSWRT-Merlin RT-AC68U_3.0.0.4 Sun Mar 20 19:52:23 UTC 2016
-sh: can't fork
rob@RT-AC68U-7658:/tmp/home/root# top
-sh: can't fork
rob@RT-AC68U-7658:/tmp/home/root# tail /var/log
log.nmbd log.samba log.smbd log/
rob@RT-AC68U-7658:/tmp/home/root# reboot
-sh: can't fork
rob@RT-AC68U-7658:/tmp/home/root#
 
Running out of RAM would be the most likely reason.

Syslog is located in /tmp/syslog.log.
 
That message is often seen when the machine has run out of free memory.

Given that /tmp is stored in RAM one might speculate that there is a problem with a log file (syslog.log?) that has grown so large that it has consumed all the memory.

Have you enabled any kind of 'extra' logging, like firewall-accepted packets?
 
Unfortunately there is nothing useful in the syslog, and a very large time gap, which suggests to me that the out of ram condition stopped the process which flushes the RAM log file to NVRAM. If I encounter this condition again, I now know the path to the syslog file and will look there first.
Note that I tried to run 'top' to look at memory but the sh couldn't fork it.

================

Jul 31 17:00:33 ddns update: asusddns_update: 0
Jul 31 17:00:35 ddns: ddns update ok
Jul 31 17:00:35 rc_service: udhcpc 516:notify_rc start_vpnserver1
Jul 31 17:00:36 ntp: start NTP update
Jul 31 17:00:37 openvpn-routing: Refreshing policy rules for client 1
Apr 22 07:54:03 rc_service: ntp 675:notify_rc restart_upnp
Apr 22 07:54:03 rc_service: waitting "start_vpnserver1" via udhcpc ...
Apr 22 07:54:03 openvpn-routing: Allow WAN access to all VPN clients
Apr 22 07:54:03 openvpn-routing: Refreshing policy rules for client 2
Apr 22 07:54:03 openvpn-routing: Allow WAN access to all VPN clients
Apr 22 07:54:03 openvpn-routing: Refreshing policy rules for client 3
 
That message is often seen when the machine has run out of free memory.

Given that /tmp is stored in RAM one might speculate that there is a problem with a log file (syslog.log?) that has grown so large that it has consumed all the memory.

Have you enabled any kind of 'extra' logging, like firewall-accepted packets?

No, I didn't change any logging settings, although I have enabled all of the traffic monitoring and QOS features.
After reboot the log is small and there seems plenty of free space in the file systems:

rob@RT-AC68U-7658:/tmp/home/root# ls -l /tmp/syslog.log
-rw-rw-rw- 1 rob root 37110 Apr 22 08:21 /tmp/syslog.log
rob@RT-AC68U-7658:/tmp/home/root# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 28928 28928 0 100% /
devtmpfs 127752 0 127752 0% /dev
tmpfs 127856 1128 126728 1% /tmp
/dev/mtdblock4 64256 5176 59080 8% /jffs
rob@RT-AC68U-7658:/tmp/home/root#
 
Yes, it's something of a "chicken and egg" situation. You need free memory to run the commands that tell you why you have no free memory.

I think the big gap in the timestamps you mentioned indicate that the router was rebooted. The July 31 date is probably the default power-on date before it syncs with NTP.

Just try and keep an eye on the syslog and try and catch it before it gets too big (assuming it's syslog and not some other file).
 
So if I want to sure I can remotely access my router, it seems I should set 'Enable Reboot Scheduler' to 'yes' and reboot once a week.
Do you do that?
thanks
 
Yes, it's something of a "chicken and egg" situation. You need free memory to run the commands that tell you why you have no free memory.

I think the big gap in the timestamps you mentioned indicate that the router was rebooted. The July 31 date is probably the default power-on date before it syncs with NTP.

Just try and keep an eye on the syslog and try and catch it before it gets too big (assuming it's syslog and not some other file).
I can see that you are correct about the the July dates at the head of the syslog
So the syslog does not persist through a reboot?
If it would help someone's debugging I could setup my ASUS to log to my NAS.
 
I can see that you are correct about the the July dates at the head of the syslog
So the syslog does not persist through a reboot?
If it would help someone's debugging I could setup my ASUS to log to my NAS.

Correct. syslog is written to /tmp, which is a ramfs, so it goes away on reboot

Sending your ASUS syslog to you NAS is an excellent solution
 
Punting this topic to the top again because it's still happening in 2021 with the current builds on an AX88U. Other symptoms include clients suddenly being unable to acquire an IP address, and the router page not loading at all.

There's a memory leak somewhere in the system. Any suggestions on how to monitor it *other* than syslog? Because the syslog alone won't really capture the true extent of what's happening. Need some way to send the output of 'ps -ax' or 'top' periodically.
 
I'm running df -h every 24 hours or so. It appears that the following folders grow steadily and aren't cleaned up automatically: /var/tmpfs /jffs
 
Aha. I think I found our culprit. Check out the file size growth of /var/log/strongswan.charon.log.

Code:
-rw-rw-rw-    1 admin    root      76993745 Jul 21 11:54 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77000969 Jul 21 11:55 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77017643 Jul 21 11:56 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77024811 Jul 21 11:57 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77037008 Jul 21 11:58 strongswan.charon.log

That's a growth rate of nearly 9KB per minute in a folder that is mounted in RAM.
 
If you have a USB stick attached to your router, perhaps consider installing scribe + uiscribe packages through amtm? This will install a "proper" syslog + log rotation. You could configure it to nudge/rotate the swan logs too if it doesn't do it by default? I don't use ipsec so can't say if it'll do it off the bat.
 
Aha. I think I found our culprit. Check out the file size growth of /var/log/strongswan.charon.log.

Code:
-rw-rw-rw-    1 admin    root      76993745 Jul 21 11:54 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77000969 Jul 21 11:55 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77017643 Jul 21 11:56 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77024811 Jul 21 11:57 strongswan.charon.log

-rw-rw-rw-    1 admin    root      77037008 Jul 21 11:58 strongswan.charon.log

That's a growth rate of nearly 9KB per minute in a folder that is mounted in RAM.
Can you post what is flooding this log? Also, please post the content of /etc/strongswan.conf.

Thanks.
 
Sure thing... I think I replied in the beta thread, but I'll post a snippet here for reference.

Contents of /etc/strongswan.conf
Code:
admin@RT-AX88U-27B8:/tmp/home/root# cat /etc/strongswan.conf 
charon {
    user = admin
    threads = 8
    send_vendor_id = yes
    interfaces_ignore = br0
    starter { load_warning = no }
    load_modular = yes
    i_dont_care_about_security_and_use_aggressive_mode_psk = yes
    plugins {
        include strongswan.d/charon/*.conf
    }
    filelog {
        charon {
            path = /var/log/strongswan.charon.log
            time_format = %b %e %T
            default = 1
            append = no
            flush_line = yes
        }
    }
}


Here are some lines from /var/log/strongswan.charon.log, with my IP address redacted to ###.
Code:
Jul 22 10:59:37 06[NET] received packet: from 75.146.134.42[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 10:59:37 06[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 10:59:37 06[ENC] could not decrypt payloads
Jul 22 10:59:37 06[IKE] message parsing failed
Jul 22 10:59:37 06[ENC] generating INFORMATIONAL_V1 request 1959529149 [ HASH N(PLD_MAL) ]
Jul 22 10:59:37 06[NET] sending packet: from ###.###.###.###[500] to 75.146.134.42[500] (76 bytes)
Jul 22 10:59:37 06[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 10:59:43 07[JOB] deleting half open IKE_SA with 45.48.96.61 after timeout
Jul 22 10:59:43 08[NET] received packet: from 45.48.96.61[500] to ###.###.###.###[500] (168 bytes)
Jul 22 10:59:43 08[ENC] parsed ID_PROT request 0 [ SA V V V V ]
Jul 22 10:59:43 08[IKE] received NAT-T (RFC 3947) vendor ID
Jul 22 10:59:43 08[IKE] received draft-ietf-ipsec-nat-t-ike-07 vendor ID
Jul 22 10:59:43 08[IKE] received draft-ietf-ipsec-nat-t-ike-03 vendor ID
Jul 22 10:59:43 08[IKE] received draft-ietf-ipsec-nat-t-ike-02\n vendor ID
Jul 22 10:59:43 08[IKE] 45.48.96.61 is initiating a Main Mode IKE_SA
Jul 22 10:59:43 08[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
Jul 22 10:59:43 08[ENC] generating ID_PROT response 0 [ SA V V V V ]
Jul 22 10:59:43 08[NET] sending packet: from ###.###.###.###[500] to 45.48.96.61[500] (160 bytes)
Jul 22 10:59:43 06[NET] received packet: from 45.48.96.61[500] to ###.###.###.###[500] (284 bytes)
Jul 22 10:59:43 06[ENC] parsed ID_PROT request 0 [ KE No V V V NAT-D NAT-D ]
Jul 22 10:59:43 06[IKE] received DPD vendor ID
Jul 22 10:59:43 06[ENC] received unknown vendor ID: bb:d3:8e:e7:62:27:5f:8a:a7:4e:59:58:42:eb:1f:53
Jul 22 10:59:43 06[IKE] received XAuth vendor ID
Jul 22 10:59:43 06[IKE] remote host is behind NAT
Jul 22 10:59:43 06[ENC] generating ID_PROT response 0 [ KE No NAT-D NAT-D ]
Jul 22 10:59:43 06[NET] sending packet: from ###.###.###.###[500] to 45.48.96.61[500] (244 bytes)
Jul 22 10:59:43 05[NET] received packet: from 45.48.96.61[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 10:59:43 05[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 10:59:43 05[ENC] could not decrypt payloads
Jul 22 10:59:43 05[IKE] message parsing failed
Jul 22 10:59:43 05[ENC] generating INFORMATIONAL_V1 request 1201172663 [ HASH N(PLD_MAL) ]
Jul 22 10:59:43 05[NET] sending packet: from ###.###.###.###[500] to 45.48.96.61[500] (76 bytes)
Jul 22 10:59:43 05[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 10:59:47 07[NET] received packet: from 75.146.134.42[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 10:59:47 07[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 10:59:47 07[ENC] could not decrypt payloads
Jul 22 10:59:47 07[IKE] message parsing failed
Jul 22 10:59:47 07[ENC] generating INFORMATIONAL_V1 request 2211980669 [ HASH N(PLD_MAL) ]
Jul 22 10:59:47 07[NET] sending packet: from ###.###.###.###[500] to 75.146.134.42[500] (76 bytes)
Jul 22 10:59:47 07[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 10:59:53 06[NET] received packet: from 45.48.96.61[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 10:59:53 06[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 10:59:53 06[ENC] could not decrypt payloads
Jul 22 10:59:53 06[IKE] message parsing failed
Jul 22 10:59:53 06[ENC] generating INFORMATIONAL_V1 request 1882038892 [ HASH N(PLD_MAL) ]
Jul 22 10:59:53 06[NET] sending packet: from ###.###.###.###[500] to 45.48.96.61[500] (76 bytes)
Jul 22 10:59:53 06[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 10:59:57 05[JOB] deleting half open IKE_SA with 75.146.134.42 after timeout
Jul 22 11:00:03 08[NET] received packet: from 45.48.96.61[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 11:00:03 08[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 11:00:03 08[ENC] could not decrypt payloads
Jul 22 11:00:03 08[IKE] message parsing failed
Jul 22 11:00:03 08[ENC] generating INFORMATIONAL_V1 request 2442741824 [ HASH N(PLD_MAL) ]
Jul 22 11:00:03 08[NET] sending packet: from ###.###.###.###[500] to 45.48.96.61[500] (76 bytes)
Jul 22 11:00:03 08[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 11:00:05 06[NET] received packet: from 184.183.92.67[500] to ###.###.###.###[500] (168 bytes)
Jul 22 11:00:05 06[ENC] parsed ID_PROT request 0 [ SA V V V V ]
Jul 22 11:00:05 06[IKE] received NAT-T (RFC 3947) vendor ID
Jul 22 11:00:05 06[IKE] received draft-ietf-ipsec-nat-t-ike-07 vendor ID
Jul 22 11:00:05 06[IKE] received draft-ietf-ipsec-nat-t-ike-03 vendor ID
Jul 22 11:00:05 06[IKE] received draft-ietf-ipsec-nat-t-ike-02\n vendor ID
Jul 22 11:00:05 06[IKE] 184.183.92.67 is initiating a Main Mode IKE_SA
Jul 22 11:00:05 06[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
Jul 22 11:00:05 06[ENC] generating ID_PROT response 0 [ SA V V V V ]
Jul 22 11:00:05 06[NET] sending packet: from ###.###.###.###[500] to 184.183.92.67[500] (160 bytes)
Jul 22 11:00:05 05[NET] received packet: from 184.183.92.67[500] to ###.###.###.###[500] (284 bytes)
Jul 22 11:00:05 05[ENC] parsed ID_PROT request 0 [ KE No V V V NAT-D NAT-D ]
Jul 22 11:00:05 05[IKE] received DPD vendor ID
Jul 22 11:00:05 05[ENC] received unknown vendor ID: 6c:1f:a1:34:16:de:55:f0:b0:0c:d8:f1:17:ae:f3:ce
Jul 22 11:00:05 05[IKE] received XAuth vendor ID
Jul 22 11:00:05 05[ENC] generating ID_PROT response 0 [ KE No NAT-D NAT-D ]
Jul 22 11:00:05 05[NET] sending packet: from ###.###.###.###[500] to 184.183.92.67[500] (244 bytes)
Jul 22 11:00:05 07[NET] received packet: from 184.183.92.67[500] to ###.###.###.###[500] (108 bytes)
Jul 22 11:00:05 07[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 11:00:05 07[ENC] could not decrypt payloads
Jul 22 11:00:05 07[IKE] message parsing failed
Jul 22 11:00:05 07[ENC] generating INFORMATIONAL_V1 request 2192012807 [ HASH N(PLD_MAL) ]
Jul 22 11:00:05 07[NET] sending packet: from ###.###.###.###[500] to 184.183.92.67[500] (76 bytes)
Jul 22 11:00:05 07[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 11:00:13 06[NET] received packet: from 161.129.49.58[500] to ###.###.###.###[500] (168 bytes)
Jul 22 11:00:13 06[ENC] parsed ID_PROT request 0 [ SA V V V V ]
Jul 22 11:00:13 06[IKE] received NAT-T (RFC 3947) vendor ID
Jul 22 11:00:13 06[IKE] received draft-ietf-ipsec-nat-t-ike-07 vendor ID
Jul 22 11:00:13 06[IKE] received draft-ietf-ipsec-nat-t-ike-03 vendor ID
Jul 22 11:00:13 06[IKE] received draft-ietf-ipsec-nat-t-ike-02\n vendor ID
Jul 22 11:00:13 06[IKE] 161.129.49.58 is initiating a Main Mode IKE_SA
Jul 22 11:00:13 06[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
Jul 22 11:00:13 06[ENC] generating ID_PROT response 0 [ SA V V V V ]
Jul 22 11:00:13 06[NET] sending packet: from ###.###.###.###[500] to 161.129.49.58[500] (160 bytes)
Jul 22 11:00:13 05[NET] received packet: from 161.129.49.58[500] to ###.###.###.###[500] (284 bytes)
Jul 22 11:00:13 05[ENC] parsed ID_PROT request 0 [ KE No V V V NAT-D NAT-D ]
Jul 22 11:00:13 05[IKE] received DPD vendor ID
Jul 22 11:00:13 05[ENC] received unknown vendor ID: 03:69:7c:20:5b:46:af:22:fa:33:bf:3a:fd:4d:2c:78
Jul 22 11:00:13 05[IKE] received XAuth vendor ID
Jul 22 11:00:13 05[IKE] remote host is behind NAT
Jul 22 11:00:13 05[ENC] generating ID_PROT response 0 [ KE No NAT-D NAT-D ]
Jul 22 11:00:13 05[NET] sending packet: from ###.###.###.###[500] to 161.129.49.58[500] (244 bytes)
Jul 22 11:00:13 07[NET] received packet: from 161.129.49.58[4500] to ###.###.###.###[4500] (108 bytes)
Jul 22 11:00:13 07[ENC] invalid ID_V1 payload length, decryption failed?
Jul 22 11:00:13 07[ENC] could not decrypt payloads
Jul 22 11:00:13 07[IKE] message parsing failed
Jul 22 11:00:13 07[ENC] generating INFORMATIONAL_V1 request 1788166884 [ HASH N(PLD_MAL) ]
Jul 22 11:00:13 07[NET] sending packet: from ###.###.###.###[500] to 161.129.49.58[500] (76 bytes)
Jul 22 11:00:13 07[IKE] ID_PROT request with message ID 0 processing failed
Jul 22 11:00:13 08[JOB] deleting half open IKE_SA with 45.48.96.61 after timeout
 
Presumably you have deliberately enabled the IPSec server? So this looks like normal scanning/hacking attempts. So I guess the question becomes can the router automatically truncate the log file?
 
Sure thing... I think I replied in the beta thread, but I'll post a snippet here for reference.
Thanks. I will relay the information upstream to the Asus engineer responsible for IPSEC's implementation so they can have a look at it.
 
Thanks. I will relay the information upstream to the Asus engineer responsible for IPSEC's implementation so they can have a look at it.

Sweet! In case you were wondering, I found the config file reference site for strongswan, and specifically the logging section. It's mainly for your interest, because there doesn't appear to be a log rotation facility inherent to the daemon. That would be up to the OS and whatever script Asus wants to write.


A couple of things to note:

They're using the default log level of "1" which is "General control flow with errors, a good default to see what's going on." That is the lowest useful level. Level 0 shows extremely basic service activity only.

It's also set to use IKEv1 Aggressive Mode, which strongSwan doesn't recommended for security reasons. However, this might be a necessary compromise to use the router's implementation of IPSec.
 
That would be up to the OS and whatever script Asus wants to write.
I suspect they will handle it through the watchdog. I'll leave it up to them to take care of it - I prefer to delegate to them whenever it's something they can fix, and it's also in their best interest to fix :)
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top