Playing around with this:
Code:
wildcard_file(base_dir("/opt/var/log")
filename_pattern("syslogd.ScribeInitReboot.LOG")
recursive(no) max-files(1) follow_freq(1)
log_iw_size(1200) log_fetch_limit(1000)
keep-timestamp(no)
flags(syslog-protocol));
You will see I have added
keep-timestamp(no). The effect of this is that when the Reboot.LOG is read back in, messages get a time stamp based on the time processed, rather than the dawn-of-time until ntp syncs. That results in approximately (for me) a three minute gap between the last shutdown message and the busybox start message. That looks neater to me, but scrolling up the time of reboot doesn't stand out so clearly in the 1,447 log messages that make up my startup sequence.
If I follow, with this configuration, the Reboot.LOG is read every second (also the default), and a check is made to see if there are any messages after the last message recorded in the /opt/var/syslog-ng.persist file (which is unreadable). Only new messages are processed. My sense is that this also means the boot sequence is only processed on a reboot but not a restart, which would be desirable behavior if true. It strikes me that it might be more efficient to set
follow_freq() to something like a million, since the Reboot.LOG won't change. In 4.9 the preferred behavior is to rely on inotify if it is available on our routers. I wonder if it is? The documentation suggests that rather than following a file, it is preferable to use a program() source to cat the file to messages, but that means that the .persist behavior isn't in play, and the startup messages aren't processed by syslog-ng so the timestamp and filtering behavior isn't in play either.
Now if I follow, log_iw_size() is ignored unless flags(flow-control) is specified for the source. Log_fetch_limit() will read 1000 messages from the file in one thread, and then process those in a multiple threads, writing them out to the destinations in one thread. So in my case, the first 1000 messages will be read and processed, and then syslog-ng will go back for the other 447. It is a form of flow control, but during that time, other messages may be processed from other sources. The output buffer is 2048, so we will never lose messages; we don't get more than 20 or so a second from any other source.
If flags(flow-control) is enabled, then the second bite at Reboot.LOG will be limited to 200 messages, unless some of the 1200 messages in the log_iw_size have been processed. So it might be worth playing with reducing the fetch_limit to 100 and seeing what happens, or increasing the log_iw_size and implementing flow control. When I was fooling around with this a few years ago, I added flow control but not any other options (these were introduced in 4.7, I think), so my fetch limit was 10.
In any case, if I clear the messages log and reboot, my messages file records about 50 shut down messages and four or so minutes later picks up with the reboot sequence, missing those that are filtered into other destinations. All have the same timestamp, so I can't tell what of those messages (if any) are the product of sources other than the Reboot.LOG while it is being processed. But still, a great improvement in scribe, I think.