On Saturday evening, I started to take a much closer look at the 2 major junctions when a router running Scribe transitions from using the built-in system loggers (klogd and syslogd) to running syslog-ng (when Entware services are started after the USB-attached drive is mounted), and vice versa (when Entware services are stopped after the USB-attached drive is unmounted).
After analyzing the sequence of events around the execution flow of the '/jffs/scripts/post-mount' and '/jffs/scripts/unmount' scripts, and also running some debugging code, I found 2 significant problems:
1) When the call to Scribe found in the '/jffs/scripts/unmount' script is made *after* the Entware services are stopped, it fails to properly reset and put back the system log files so the built-in system loggers can then proceed to update them. This, in turn, is causing hundreds of log messages to be "lost" during a reboot sequence.
2) Upon a system reboot, when Entware services are started, syslog-ng was shutting down the built-in system loggers too early and with a minimum 2-second delay (and perhaps more in some cases), causing hundreds (and possibly close to a thousand) of reboot log messages to be "lost" due to the "deluge" of log entries flooding the system logs within the first ~2 to ~3 minutes of the bootup process (the time can vary depending on the number of built-in services and Entware services being started, and other system initialization activities).
So in order to address the above 2 problems, I've modified the code. For the 1st issue, when unmounting the USB drive, I'm making sure that the system log files are put back after syslog-ng is stopped, and before the built-in loggers (klogd and syslogd) are restarted. Also, I've added code to save a backup of the syslog-ng messages log file, for restoration purposes.
For the 2nd problem, I'm creating a background script to handle the delay of starting syslog-ng service until the reboot sequence has completed, and system activity has reached a more "normal" state, when the deluge of reboot log messages has also subsided. This usually happens within the first ~3 minutes after a reboot, and only then will the background task start syslog-ng in a more "stable" context/environment.
I plan to have a new BETA version of Scribe by Sunday evening once I've completed, tested, and validated (as much as I can) all the recent changes.
Just FYI.
After analyzing the sequence of events around the execution flow of the '/jffs/scripts/post-mount' and '/jffs/scripts/unmount' scripts, and also running some debugging code, I found 2 significant problems:
1) When the call to Scribe found in the '/jffs/scripts/unmount' script is made *after* the Entware services are stopped, it fails to properly reset and put back the system log files so the built-in system loggers can then proceed to update them. This, in turn, is causing hundreds of log messages to be "lost" during a reboot sequence.
2) Upon a system reboot, when Entware services are started, syslog-ng was shutting down the built-in system loggers too early and with a minimum 2-second delay (and perhaps more in some cases), causing hundreds (and possibly close to a thousand) of reboot log messages to be "lost" due to the "deluge" of log entries flooding the system logs within the first ~2 to ~3 minutes of the bootup process (the time can vary depending on the number of built-in services and Entware services being started, and other system initialization activities).
So in order to address the above 2 problems, I've modified the code. For the 1st issue, when unmounting the USB drive, I'm making sure that the system log files are put back after syslog-ng is stopped, and before the built-in loggers (klogd and syslogd) are restarted. Also, I've added code to save a backup of the syslog-ng messages log file, for restoration purposes.
For the 2nd problem, I'm creating a background script to handle the delay of starting syslog-ng service until the reboot sequence has completed, and system activity has reached a more "normal" state, when the deluge of reboot log messages has also subsided. This usually happens within the first ~3 minutes after a reboot, and only then will the background task start syslog-ng in a more "stable" context/environment.
I plan to have a new BETA version of Scribe by Sunday evening once I've completed, tested, and validated (as much as I can) all the recent changes.
Just FYI.