What's new

Scribe syslog-ng not starting on boot if IPv6 enabled.

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

unsynaps

Senior Member
Anyone else having issues having syslog-ng starting at boot from a fresh factory default? Only entware and Scribe installed. All other scripts removed.

The syslog-ng daemon seems to not start when the router boots when IPv6 is enabled,
Going to do some more testing.

Factory default.
Bare basic setup (wifi name and password. admin name and password. enable ssh.)
Turn on ipv6 to native mode. All default settings.
Boot amtm.
Install entware.
Install scribe.
Reboot.
scribe status
Code:
                            _
                         _ ( )
       ___    ___  _ __ (_)| |_      __
     /',__) /'___)( '__)| || '_`\  /'__`\
     \__, \( (___ | |   | || |_) )(  ___/
     (____/`\____)(_)   (_)(_,__/'`\____)
     syslog-ng and logrotate installation
     v3.1_0 (master)    Coded by cmkelley


      checking syslog-ng daemon ... dead.
    the system logger (syslogd) ... is running.

    Type scribe restart at shell prompt or select rs
    from scribe main menu to start syslog-ng.
    syslog.log default location ... /tmp/syslog.log
  ... & agrees with config file ... okay!

 checking system for necessary scribe hooks ...

          checking S01syslog-ng ... present.
            checking post-mount ... present.
               checking unmount ... present.

 checking syslog-ng configuration ...

   syslog-ng.conf version check ... in sync. (3.38)
    syslog-ng.conf syntax check ... okay!

          scribe installed version: v3.1_0 (master)
             scribe GitHub version: v3.1_0 (master)
                    scribe is up to date!


 syslog-ng  not running, command "status" not valid!
Disable ipv6.
Reboot.
syslog-ng started fine and all is working.
 
Last edited:
Thought:

The startup of IPv6 is slowing the boot enough to cause issues with syslog-ng starting properly. Blocking scripts and all that possibly.
 
Last edited:
More prodding.

/opt/etc/init.d/rc.func.syslog-ng

Code:
    # kill any/all running klogd and/or syslogd
    [ -n "$( /bin/pidof klogd )" ] && killall klogd
    [ -n "$( /bin/pidof syslogd )" ] && killall syslogd
    count=120
    klgk=false
    sldk=false
    touch /tmp/home/root/loopCount1Pre-[$count]
    while [ $count -gt 0 ]
    do
        sleep 1 # give them a moment to shut down
        [ -z "$( /bin/pidof klogd )" ] && klgk=true
        [ -z "$( /bin/pidof syslogd )" ] && sldk=true
        if $klgk && $sldk; then count=-1; fi
        count=$(( count - 1 ))
        touch /tmp/home/root/loopCount2While-[$count]
    done
    touch /tmp/home/root/loopCount3Post-[$count]
    [ $count -eq 0 ] && exit 1

Code:
admin@mpu:/tmp/home/root# ls -v
loopCount1Pre-[120]    loopCount2While-[17]   loopCount2While-[35]   loopCount2While-[53]   loopCount2While-[71]   loopCount2While-[89]   loopCount2While-[107]
loopCount2While-[0]    loopCount2While-[18]   loopCount2While-[36]   loopCount2While-[54]   loopCount2While-[72]   loopCount2While-[90]   loopCount2While-[108]
loopCount2While-[1]    loopCount2While-[19]   loopCount2While-[37]   loopCount2While-[55]   loopCount2While-[73]   loopCount2While-[91]   loopCount2While-[109]
loopCount2While-[2]    loopCount2While-[20]   loopCount2While-[38]   loopCount2While-[56]   loopCount2While-[74]   loopCount2While-[92]   loopCount2While-[110]
loopCount2While-[3]    loopCount2While-[21]   loopCount2While-[39]   loopCount2While-[57]   loopCount2While-[75]   loopCount2While-[93]   loopCount2While-[111]
loopCount2While-[4]    loopCount2While-[22]   loopCount2While-[40]   loopCount2While-[58]   loopCount2While-[76]   loopCount2While-[94]   loopCount2While-[112]
loopCount2While-[5]    loopCount2While-[23]   loopCount2While-[41]   loopCount2While-[59]   loopCount2While-[77]   loopCount2While-[95]   loopCount2While-[113]
loopCount2While-[6]    loopCount2While-[24]   loopCount2While-[42]   loopCount2While-[60]   loopCount2While-[78]   loopCount2While-[96]   loopCount2While-[114]
loopCount2While-[7]    loopCount2While-[25]   loopCount2While-[43]   loopCount2While-[61]   loopCount2While-[79]   loopCount2While-[97]   loopCount2While-[115]
loopCount2While-[8]    loopCount2While-[26]   loopCount2While-[44]   loopCount2While-[62]   loopCount2While-[80]   loopCount2While-[98]   loopCount2While-[116]
loopCount2While-[9]    loopCount2While-[27]   loopCount2While-[45]   loopCount2While-[63]   loopCount2While-[81]   loopCount2While-[99]   loopCount2While-[117]
loopCount2While-[10]   loopCount2While-[28]   loopCount2While-[46]   loopCount2While-[64]   loopCount2While-[82]   loopCount2While-[100]  loopCount2While-[118]
loopCount2While-[11]   loopCount2While-[29]   loopCount2While-[47]   loopCount2While-[65]   loopCount2While-[83]   loopCount2While-[101]  loopCount2While-[119]
loopCount2While-[12]   loopCount2While-[30]   loopCount2While-[48]   loopCount2While-[66]   loopCount2While-[84]   loopCount2While-[102]  loopCount3Post-[0]
loopCount2While-[13]   loopCount2While-[31]   loopCount2While-[49]   loopCount2While-[67]   loopCount2While-[85]   loopCount2While-[103]  service-event
loopCount2While-[14]   loopCount2While-[32]   loopCount2While-[50]   loopCount2While-[68]   loopCount2While-[86]   loopCount2While-[104]
loopCount2While-[15]   loopCount2While-[33]   loopCount2While-[51]   loopCount2While-[69]   loopCount2While-[87]   loopCount2While-[105]
loopCount2While-[16]   loopCount2While-[34]   loopCount2While-[52]   loopCount2While-[70]   loopCount2While-[88]   loopCount2While-[106]
Seems its having issues killing klogd and syslogd for some reason as the counter reaches 0 and exits the script thus not starting syslog-ng.
SOMETIMES it works. But most tiles it will fail and only seemingly when I have IPv6 enabled.
 
Last edited:
Code:
    # kill any/all running klogd and/or syslogd
    touch /tmp/home/root/_PID1pre-klogd-[$( /bin/pidof klogd )]
    [ -n "$( /bin/pidof klogd )" ] && killall klogd
    touch /tmp/home/root/_PID1pre_syslogd-[$( /bin/pidof syslogd )]
    [ -n "$( /bin/pidof syslogd )" ] && killall syslogd
    count=30
    klgk=false
    sldk=false
    touch /tmp/home/root/loopCount1Pre-[$count]
    while [ $count -gt 0 ]
    do
        sleep 1 # give them a moment to shut down
        touch /tmp/home/root/_PID2while-klogd-[$( /bin/pidof klogd )]
        [ -z "$( /bin/pidof klogd )" ] && klgk=true
        touch /tmp/home/root/_PID2while-syslogd-[$( /bin/pidof syslogd )]
        [ -z "$( /bin/pidof syslogd )" ] && sldk=true
        if $klgk && $sldk; then count=-1; fi
        count=$(( count - 1 ))
        touch /tmp/home/root/loopCount2While-[$count]
    done
    touch /tmp/home/root/loopCount3Post-[$count]
    [ $count -eq 0 ] && exit 1
Code:
admin@mpu:/tmp/home/root# ls -v
_PID1pre-klogd-[1145]      loopCount2While-[2]        loopCount2While-[9]        loopCount2While-[16]       loopCount2While-[23]       loopCount3Post-[0]
_PID1pre_syslogd-[1143]    loopCount2While-[3]        loopCount2While-[10]       loopCount2While-[17]       loopCount2While-[24]       service-event
_PID2while-klogd-[2621]    loopCount2While-[4]        loopCount2While-[11]       loopCount2While-[18]       loopCount2While-[25]
_PID2while-syslogd-[2617]  loopCount2While-[5]        loopCount2While-[12]       loopCount2While-[19]       loopCount2While-[26]
loopCount1Pre-[30]         loopCount2While-[6]        loopCount2While-[13]       loopCount2While-[20]       loopCount2While-[27]
loopCount2While-[0]        loopCount2While-[7]        loopCount2While-[14]       loopCount2While-[21]       loopCount2While-[28]
loopCount2While-[1]        loopCount2While-[8]        loopCount2While-[15]       loopCount2While-[22]       loopCount2While-[29]

HAH!
For some reason after klogd and syslogd are killed they start right back up again.
PIDwhile should be empty as the processes should be killed, but they arnt.
Because the PID's changed they were killed and started again by... something.

Another note. This only seems to be happening on router boot.
If I run 'scribe restart' after a failed syslog-ng start at boot syslog-ng starts properly.

I have to break off testing for now. Think my roommate is up and doesn't take kindly to the internet going out
when I reboot the router.
 
Last edited:
Having this issue on RT-AC86U as well, running IPv6. Hope someone can shed more finding and/or a solution for this
 
Thought:

The startup of IPv6 is slowing the boot enough to cause issues with syslog-ng starting properly. Blocking scripts and all that possibly.
Which is odd, I've used IPv6 for years without a problem. This weekend is my factory reset router and start from scratch on 386.11 weekend, I've been running 386.09 up until now, haven't got to installing scribe yet, just setting up my environment. I'll see if I have this issue later today when I get scribe started. Did you check if your IPv6 is actually working? Can you reach for instance www.v6.facebook.com?

@unknownz, what firmware version?
 
Made a post on Entware github issue tracker as I think this is more of an Entware issue than a Scribe one.

Not to mention may have found a solution.

Okay, installed scribe and rebooted a bunch of times (RT-AC86U / 386.11) and was unable to reproduce your issue.

At the risk of "passing the buck" it sounds like it's neither an Entware nor Scribe issue, my guess is it's a problem with your ISP. IPv6 shouldn't take any longer to come up than IPv4. The fact that the firmware is restarting klogd & syslogd means something in the firmware is hiccupping, possibly your ISP isn't responding correctly to the IPv6 request (shooting in the dark here - IPv6 is a black box to me).

Probably my fault. I screwed up setting TZ in rc.func.syslog-ng. For some reason I can't fathom, for some people, some things won't work correctly if TZ isn't set. I have no idea why this affects some people and not others. Since I'm one of the lucky ones it doesn't affect, I didn't catch the problem.

The below is likely not necessary if scribe is updated to 3.1_1.

Try moving the lines below to be inside the do loop in rc.func.syslog-ng
Code:
[ -n "$( /bin/pidof klogd )" ] && killall klogd
[ -n "$( /bin/pidof syslogd )" ] && killall syslogd
 
Last edited:
Try moving the lines below to be inside the do loop in rc.func.syslog-ng
Code:
[ -n "$( /bin/pidof klogd )" ] && killall klogd
[ -n "$( /bin/pidof syslogd )" ] && killall syslogd
That was the next thing I was going to try but with my roommate home all extended holiday weekend not sure when I can test.
Already have a sed script to alter the script as a 'fix' if it turns out to be one.

Found a post where Merlin was saying a TON of stuff causes klogd and syslogd to kick if they arnt running.

service-event-end is also not an answer because then scribe restart and so forth don't work because it tries to start klogd and syslogd and checks to make sure its running.
You just end up in a start kill loop till scribe gives up.

FiOS is the ISP just for reference.
 
That was the next thing I was going to try but with my roommate home all extended holiday weekend not sure when I can test.
Already have a sed script to alter the script as a 'fix' if it turns out to be one.

Found a post where Merlin was saying a TON of stuff causes klogd and syslogd to kick if they arnt running.

service-event-end is also not an answer because then scribe restart and so forth don't work because it tries to start klogd and syslogd and checks to make sure its running.
You just end up in a start kill loop till scribe gives up.

FiOS is the ISP just for reference.
Ahhhh, I think know what the problem is (maybe, lol).

Last run through, I took a suggestion from shellcheck I shouldn't have. 'doh!

At the end of rc.func.syslog-ng the lines
Code:
TZ="$( /bin/cat "/etc/TZ" )"
[ -z "$TZ" ] && export TZ
are obviously stupid. I separated the lines without thinking, it should be:
Code:
[ -z "$TZ" ] && export TZ="$( /bin/cat "/etc/TZ" )"
Some things don't work correctly if TZ isn't set correctly. Shellcheck flagged the "should be" and I didn't think it through. That's what happens when you spend a long time away from your code.
 
Which is odd, I've used IPv6 for years without a problem. This weekend is my factory reset router and start from scratch on 386.11 weekend, I've been running 386.09 up until now, haven't got to installing scribe yet, just setting up my environment. I'll see if I have this issue later today when I get scribe started. Did you check if your IPv6 is actually working? Can you reach for instance www.v6.facebook.com?

@unknownz, what firmware version?
@cmkelley running the latest Merlin firmware, 386.11 and updated to the latest version of Scribe, 3.1_1 with the following fix
Code:
[ -z "$TZ" ] && export TZ="$( /bin/cat "/etc/TZ" )"

Issue does not seem to be fixed during a reboot and your suggestion of putting the following lines in the do loop seems to be the answer
Code:
[ -n "$( /bin/pidof klogd )" ] && killall klogd
[ -n "$( /bin/pidof syslogd )" ] && killall syslogd
 
@cmkelley running the latest Merlin firmware, 386.11 and updated to the latest version of Scribe, 3.1_1 with the following fix
Code:
[ -z "$TZ" ] && export TZ="$( /bin/cat "/etc/TZ" )"

Issue does not seem to be fixed during a reboot and your suggestion of putting the following lines in the do loop seems to be the answer
Code:
[ -n "$( /bin/pidof klogd )" ] && killall klogd
[ -n "$( /bin/pidof syslogd )" ] && killall syslogd
Well, that sucks. Do you have IPv6 enabled? Trying to see if that's a common denominator. If you do, what happens if you disable IPv6 and leave those lines outside of the do loop?

Ugh.
 
Well, that sucks. Do you have IPv6 enabled? Trying to see if that's a common denominator. If you do, what happens if you disable IPv6 and leave those lines outside of the do loop?

Ugh.
Yes, I have IPv6 enabled

Scribe starts up fine after a reboot if IPv6 disabled and having those 2 lines outside of the do loop i.e. as they were originally at; same behaviour as what @unsynaps reported in the first post
 
Still at a loss as to why enabling IPv6 is causing the issue. Heh.
I can only guess that some ISPs are doing something wrong in the IPv6 "DHCP" routine causing slowness. Or there's a bug in ASUS' implementation that only manifests in some situations. In either case it wouldn't be an issue if we weren't running custom scripts that are trying to do things at boot-up.

Anyways, I think I'm just going to give up and move the kill bit to inside the do loop. That's really about all I can do at this point.
 
Yeah at this point I would call this solved. Done a bunch of restarts since the update installing other software and syslog-ng is starting as it should.

Thanks for the help. 👍👍👍
 
Yeah at this point I would call this solved.
Solved in the sense that it is starting for you. But are you capturing the full boot sequence? I'm still concerned this burns down the house.
 
Solved in the sense that it is starting for you. But are you capturing the full boot sequence? I'm still concerned this burns down the house.
Have to agree with this. Though it seems the starting of Scribe looks fine, but the actual logging seems to have some "issue".

All my filtered logs are not populating proper.
 
Have to agree with this. Though it seems the starting of Scribe looks fine, but the actual logging seems to have some "issue".

All my filtered logs are not populating proper.
Just a bit of a clarification - scribe doesn't start when the computer boots, syslog-ng does. Fundamentally, all scribe does is install syslog-ng and logrotate, modify the S01syslog-ng file that gets installed in /opt/etc/init.d to terminate syslogd and klogd, and provide a couple utilities like checking that the installation is still good (status), generate a debug file, etc. when scribe is run from the command line or amtm.

I really can't do much of anything about syslogd and klogd getting restarted immediately after sending them the kill signal during start-up depending on if the router has IPv6 (or some other undetermined service) running, the phase of the moon and how they hold their tongue during boot-up. It doesn't even seem to be linked to certain routers or firmware versions, I have IPv6 running on an RT-AC86U, never had any issues, but I know there are people with AC-86U's that ARE having the start-up issue.

Someone (forgive me for not scrolling up to give credit, it's been a long day) found that it is new instances of syslogd and klogd (different PIDs = new instances) that are being started, not the first instances refusing to be killed. Yes, that's going to have an effect on what messages actually get captured during boot and what gets lost. Like RMerlin says "Out of my control." It's difficult enough to troubleshoot issues when I have them on my system, it's 100x harder when it's someone remote. Figuring out how to fix the issue with some versions of firmware putting syslog.log in /jffs instead of /tmp took days instead of hours because I couldn't test my attempts at fixing it, so it was a bit of a black box.

Also, it bears repeating that programming is not my day job, it's strictly a hobby. I explicitly refuse donations for scribe because I don't ever want to feel like I have an obligation beyond trying to fix problems when I can get to it. If someone doesn't want to trust the installation of syslog-ng to a hobbyist, the code is all there, it's GPL (again, all the good bits I stole from other people anyways), they're welcome to fork it or pay someone else to fork it. If someone is running a mission-critical system that depends on my script writing abilities, they have gravely over-estimated my skill.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top