What's new

Dnsmasq crashes, watchdog fails to restart it

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

alan6854321

Senior Member
Hi there,

I have a issue where dnsmasq crashes when my WAN connection goes down (If think this is somehow connected to YazDHCP), but it's not really a problem as the watchdog restarts it after a few seconds.

However, I've had a few instances recently when the watchdog has failed to restart the process.
This means everything that is working, stays working but no new clients can connect and get an IP address allocated.
Obviously, a reboot resolves this and also restarting dnsmasq from scMerlin works as well.

Using 'ps' there's only one dnsmasq process running when it's in this state, there's usually two.
Also there are still syslog entries from "dnsmasq-dhcp" which I assume is the remaining process.

Any ideas what to look at or do?

Code:
#
# Crash following WAN disconnected
#'
<30>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C lldpd[1736]: removal request for address of 99.99.99.99%13, but no knowledge of it
<13>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C custom_script: Running /jffs/scripts/wan-event (args: 0 disconnected)
<30>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C dnsmasq[3019]: read /etc/hosts - 24 names
<30>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C dnsmasq[3019]: read /jffs/addons/YazDHCP.d/.hostnames - 32 names
<14>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: potentially unexpected fatal signal 11.
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: CPU: 0 PID: 3019 Comm: dnsmasq Tainted: P           O    4.1.52 #2
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: Hardware name: Broadcom-v8A (DT)
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: task: ffffffc010971580 ti: ffffffc009480000 task.ti: ffffffc009480000
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: PC is at 0xf732d654
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: LR is at 0x209ac
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: pc : [<00000000f732d654>] lr : [<00000000000209ac>] pstate: 20010010
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: sp : 00000000ffddaff8
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x12: 0000000000000072
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x11: 0000000000005528 x10: 000000000009a70c
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x9 : 0000000000005528 x8 : 00000000ffffffff
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x7 : 000000000009a70c x6 : 000000000036d318
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x5 : 00000000f73da5c0 x4 : 00000000ffddaea0
<12>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C kernel: x3 : 0000000000000000 x2 : 0000000000000120
<13>Feb 11 12:31:44 RT-AX86S-F750-225CCF0-C custom_script: Running /jffs/scripts/wan-event (args: 0 stopped)
#
# Usually followed by a restart (This is the bit that get missadd)
#
<13>Feb 11 12:32:13 RT-AX86S-F750-225CCF0-C rc_service: watchdog 1610:notify_rc start_dnsmasq
<13>Feb 11 12:32:13 RT-AX86S-F750-225CCF0-C custom_script: Running /jffs/scripts/service-event (args: start dnsmasq)
<13>Feb 11 12:32:13 RT-AX86S-F750-225CCF0-C custom_config: Appending content of /jffs/configs/dnsmasq.conf.add.
<13>Feb 11 12:32:13 RT-AX86S-F750-225CCF0-C custom_script: Running /jffs/scripts/dnsmasq.postconf (args: /etc/dnsmasq.conf)
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: started, version 2.89 cachesize 1500
<31>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset no-auth cryptohash DNSSEC no-ID loop-detect no-inotify no-dumpfile
<31>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: warning: interface br2 does not currently exist
<31>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: warning: interface br1 does not currently exist
<31>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: warning: interface pptp* does not currently exist
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: asynchronous logging enabled, queue limit is 5 messages
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq-dhcp[3636]: DHCP, IP range 192.168.102.2 -- 192.168.102.254, lease time 1d
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq-dhcp[3636]: DHCP, IP range 192.168.101.2 -- 192.168.101.254, lease time 1d
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq-dhcp[3636]: DHCP, IP range 192.168.1.100 -- 192.168.1.254, lease time 1d
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using only locally-known addresses for local
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: read /etc/hosts - 24 names
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: read /jffs/addons/YazDHCP.d/.hostnames - 32 names
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq-dhcp[3636]: read /jffs/addons/YazDHCP.d/.staticlist
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq-dhcp[3636]: read /jffs/addons/YazDHCP.d/.optionslist
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using nameserver 208.67.222.222#53
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using nameserver 208.67.220.220#53
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using only locally-known addresses for local
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using nameserver 208.67.222.222#53
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using nameserver 208.67.220.220#53
<30>Feb 11 12:32:14 RT-AX86S-F750-225CCF0-C dnsmasq[3636]: using only locally-known addresses for local
 
Last edited:
I think you have to start by investigating why YazDHCP is causing dnsmasq to crash.
Yeah, but I'm a bit out of my depth there.
I know I'm not alone - it's been mentioned before.
And it only happens when the WAN goes down, which seems odd!
 
Hmm ... just tried an experiment.

The router is in a state where dmsmasq had crashed after a WAN reconnect and NOT been restarted by the watchdog. I used scMerlin to restart it and all seemed fine.

I then killed the dnsmasq process with "kill -9 nnnn" and waited.
But after five minutes or so, the watchdog still hadn't restarted it.

Can I check if the watchdog is still there? (Who watches the watchers?)

Processes running...
Code:
ps | grep atchdog
   11 RT-AC86U     0 SW   [watchdog/0]
   12 RT-AC86U     0 SW   [watchdog/1]
  767 RT-AC86U     0 SW   [dhd_watchdog_th]
 1614 RT-AC86U 13212 S    watchdog
 1615 RT-AC86U 13212 S    check_watchdog
 1616 RT-AC86U 13212 S    alt_watchdog
 8503 RT-AC86U  4768 S    grep atchdog
 
Are there still other dnsmasq processes still running?
Yes, just the one that belongs to 'nobody'
Code:
ps | grep dns
 5745 RT-AC86U  4768 S    grep dns
10190 nobody    2776 S    dnsmasq --log-async
 
Are there still other dnsmasq processes still running?

Ok - I tried killing the process that belongs to 'nobody', and the watchdog restarted them both immediately.
 
For your test you should have killed both processes using killall dnsmasq rather than just one.
Just tried that, immediate restart by the watchdog.
 
I also get the same crash (fatal Signal 11 - SIGSEV) when running YazDHCP and any dnsmasq newer than version 2.85. I believe Asuswrt-merlin is now running dnsmasq 2.90.

As @ColinTaylor said, we should probably figure out what in YazDHCP triggers this crash.
I suspect it is something in one of our YazDHCP hostnames…
 
Even though the .hostnames file is shown in the log, it is logged after the file has been read in. I would strongly consider whatever the next file in dnsmasq.conf is, which I think would be .optionslist.

Someone just needs to be willing to share all the files to debug.
 
Someone just needs to be willing to share all the files to debug.

.optionslist is empty on my router.

.staticlist has all my manual assigned addresses in it.

Code:
more .staticlist
B8:27:EB:DE:53:40,set:B8:27:EB:DE:53:40,192.168.1.10
DC:A6:32:6D:62:7A,set:DC:A6:32:6D:62:7A,192.168.1.11
00:19:86:80:0B:D8,set:00:19:86:80:0B:D8,192.168.1.12
DC:A6:32:6D:62:7B,set:DC:A6:32:6D:62:7B,192.168.1.13
B8:27:EB:9F:EA:9B,set:B8:27:EB:9F:EA:9B,192.168.1.14
E4:5F:01:20:83:76,set:E4:5F:01:20:83:76,192.168.1.15
E4:5F:01:20:83:77,set:E4:5F:01:20:83:77,192.168.1.16
1C:BF:CE:9A:67:EE,set:1C:BF:CE:9A:67:EE,192.168.1.17
D8:3A:DD:99:D0:E1,set:D8:3A:DD:99:D0:E1,192.168.1.18
28:C6:8E:34:44:55,set:28:C6:8E:34:44:55,192.168.1.20
90:09:D0:16:3D:DC,set:90:09:D0:16:3D:DC,192.168.1.22
C4:3D:C7:59:EF:9D,set:C4:3D:C7:59:EF:9D,192.168.1.23
24:4B:FE:E6:32:80,set:24:4B:FE:E6:32:80,192.168.1.36
88:78:73:6E:4E:2A,set:88:78:73:6E:4E:2A,192.168.1.40
00:1C:2B:31:E1:FF,set:00:1C:2B:31:E1:FF,192.168.1.50
64:95:6C:79:80:37,set:64:95:6C:79:80:37,192.168.1.70
00:19:FB:C3:04:CE,set:00:19:FB:C3:04:CE,192.168.1.72
 
.optionslist is empty on my router.

.staticlist has all my manual assigned addresses in it.

Code:
more .staticlist
B8:27:EB:DE:53:40,set:B8:27:EB:DE:53:40,192.168.1.10
DC:A6:32:6D:62:7A,set:DC:A6:32:6D:62:7A,192.168.1.11
00:19:86:80:0B:D8,set:00:19:86:80:0B:D8,192.168.1.12
DC:A6:32:6D:62:7B,set:DC:A6:32:6D:62:7B,192.168.1.13
B8:27:EB:9F:EA:9B,set:B8:27:EB:9F:EA:9B,192.168.1.14
E4:5F:01:20:83:76,set:E4:5F:01:20:83:76,192.168.1.15
E4:5F:01:20:83:77,set:E4:5F:01:20:83:77,192.168.1.16
1C:BF:CE:9A:67:EE,set:1C:BF:CE:9A:67:EE,192.168.1.17
D8:3A:DD:99:D0:E1,set:D8:3A:DD:99:D0:E1,192.168.1.18
28:C6:8E:34:44:55,set:28:C6:8E:34:44:55,192.168.1.20
90:09:D0:16:3D:DC,set:90:09:D0:16:3D:DC,192.168.1.22
C4:3D:C7:59:EF:9D,set:C4:3D:C7:59:EF:9D,192.168.1.23
24:4B:FE:E6:32:80,set:24:4B:FE:E6:32:80,192.168.1.36
88:78:73:6E:4E:2A,set:88:78:73:6E:4E:2A,192.168.1.40
00:1C:2B:31:E1:FF,set:00:1C:2B:31:E1:FF,192.168.1.50
64:95:6C:79:80:37,set:64:95:6C:79:80:37,192.168.1.70
00:19:FB:C3:04:CE,set:00:19:FB:C3:04:CE,192.168.1.72
What appears in dnsmasq.conf.add?
 
What appears in dnsmasq.conf.add
It copies my own (non manual assigned) host names into dnsmasq.conf.

Can't post a copy at the moment as I've just popped down the pub for a couple of beers. Will do so later.
 
What appears in dnsmasq.conf.add?
Ok, back from pub. Sorry, was thinking of 'dnsmasq.postconf'

But this is dnsmasq.conf.add
Code:
cat /jffs/configs/dnsmasq.conf.add

addn-hosts=/jffs/addons/YazDHCP.d/.hostnames # YazDHCP_hostnames
dhcp-hostsfile=/jffs/addons/YazDHCP.d/.staticlist # YazDHCP_staticlist
dhcp-optsfile=/jffs/addons/YazDHCP.d/.optionslist # YazDHCP_optionslist
 
and this is.hostnames

Code:
cat .hostnames
192.168.1.10 RaspiMon
192.168.1.11 Raspi-4B
192.168.1.12 Raspi-CUPS-Wifi
192.168.1.13 Raspi-4B-Wifi
192.168.1.14 Raspi-CUPS
192.168.1.15 RaspiHole
192.168.1.16 RaspiHole-Wifi
192.168.1.17 RaspiMon-wifi
192.168.1.18 Raspi-Zero2W
192.168.1.20 AlansNAS1
192.168.1.22 NasDS218
192.168.1.23 NasOldDuo
192.168.1.36 RT-AC86U-LAN
192.168.1.40 AlansAsusROG-WiFi
192.168.1.50 HiveHub
192.168.1.70 LG-LED-TV
192.168.1.72 SkyPlusBox
 
Can any of you reproduce this at will by running:
Code:
killall -SIGHUP dnsmasq
 
Can any of you reproduce this at will by running:
Code:
killall -SIGHUP dnsmasq

That crashes dnsmasq with "potentially unexpected fatal signal 11" and all the rest.
But it gets restarted immediately by the watchdog
 
That crashes dnsmasq with "potentially unexpected fatal signal 11" and all the rest.
But it gets restarted immediately by the watchdog
Interesting. Progress, I suppose. I can’t reproduce it myself using your data.

What are your file sizes/ownership?
Code:
ls -la /jffs/addons/YazDHCP.d/
 
Interesting. Progress, I suppose. I can’t reproduce it myself using your data.

What are your file sizes/ownership?
Code:
ls -la /jffs/addons/YazDHCP.d/
Try installing strace through Entware, and monitor the dnsmasq process with it.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top