If I follow, in the function add-on, the hunt for the syslog location wthin kill_logger only happens on start up, but not thereafter and that happens before syslogd and klogd are killed. That saves a bit of time. I had a few notions that I will share that I mean to try out, but don't actually have time in the near future:
1. The hunt for the syslog location is specific to router models. It should be possible to do this on install, and not make it part of kill_logger at all. Set and forget.
The path of the built-in system log file
is obtained during initial installation, and it gets stored in a configuration file for future retrieval. The "
kill_logger()" function knows about and has access to this configuration file, but it also has code to figure out the path on its own, just in case the configuration file is not found or it doesn't contain the expected path variable. Setting up this redundancy to be able to obtain a critical file path is good coding practice, especially for a function that must also execute independently of the shell script during the reboot sequence when the USB-attached drive containing Entware is mounted, and the
S01syslog-ng service is started. If everything is set up and working as it should, the "
kill_logger()" function simply reads the file path from the configuration file, and the rest of the redundant code may never execute after the initial installation.
Given the above, I fully disagree with your assessment that the "
kill_logger()" function should not include the code to be able to obtain the path of the built-in system log file.
2. There are a fair number of commands after the processes are killed before syslog-ng starts. Would it not be possible to move some of those up in the sequence? It really doesn't matter how long it takes syslog-ng to get going (it could be S99 instead of S01, and maybe should be); I think what matters is the delay between killing one and starting the other. This only really matters on startup, when the message rate is fastest.
The commands found after the code that kills the built-in system loggers are really very simple and take only microseconds to execute. IMO, any significant delay(s) may be due to the "
sleep 2" call made within the loop that kills the built-in system loggers.
As explained in the comment, the sleep call is made to double-check afterward that the logging services have not been restarted by the system; but it's very likely that within these 2 seconds (or more if the
sleep command is executed multiple times) some log messages are getting "lost" if nobody is "listening."
3. The log_msg_size of 10K is kind of large. For the kind of messages the routers kick off, I think 512 is enough. The product of log_msg_size and log_fifo_size establishes the buffer size for each destination, so 1024 and 512 would give you 512k (this is what I use), while 1024 and 10240 would give you 5M for each destination. Even so, htop gives me 368M as the virtual memory used by syslog-ng.
The original size of the "
log_msg_size" option was set to
16K (which is the current setting in the production release), and I have found no explanation or reason in any of the code (or old posts) for making it that large. I can only assume that
@cmkelley set that value based on user feedback or some empirical evidence showing that there are some log messages that are nearly that long. If so, those log entries are likely outliers and occur only in some extremely rare scenarios, which is why I decided to decrease the value to 10K while running some debugging code to experiment.
Ultimately, I think it may not be worth setting the "
log_msg_size" to those large values just to capture some very rare event(s), and a more conservative setting would be more useful in the long run. For now, I've settled for
2K as the maximum message size while I run the development branch version myself, and other people have volunteered to experiment with it as well.
4. While it doesn't really matter for the rate of logging in the routers, the filter/destinations are processed alphabetically. That is to say, if the noisiest programs are last (unbound?) syslog-ng processes all the other possibilities first before it gets to the relevant filter. if you call the same filter/destination file "1unbound", syslog-ng will process it first, match, and stop further processing. So I've numbered my files by number of log statements. But as I say, syslog-ng can handle logging rates a hundred times faster than the router spins off.
I have no comment here as I didn't quite understand the main point of what you were trying to explain.
Just my 2 cents.