What's new

Beta Asuswrt-Merlin 3004.388.6_x test builds (dnsmasq 2.90)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Status
Not open for further replies.
might also play around with the strict vs opportunistic.
The only real point of that setting is how to handle DNS queries when your clock isn't properly set yet, meaning crypto validation cannot be done. It's unrelated to what is being investigated.
 
The SERVFAIL error is actually DNSSEC doing its job.

DNSSEC prevents a reply from being replaced by a different answer, by using reply signing. When using Cloudflare 1.1.1.2, Cloudflare returns a different answer than what is the real answer, returning 0.0.0.0 instead of the correct IP. Dnsmasq receives this response, it fails DNSSEC validation, and turns the response into a SERVFAIL error.

Then, dnsmasq incorrectly logs it as a resource limit exceed, which is the log entry people are seeing.

I explored the dnsmasq code and couldn't come up with a simple fix, so I will just remove the log entry, or move it to a DEBUG priority to hide it from normal users.
 
The only real point of that setting is how to handle DNS queries when your clock isn't properly set yet, meaning crypto validation cannot be done. It's unrelated to what is being investigated.
I tried this
1708236009941.png
since the log entries mention validation, didn't help
The only thing that did is turning off DNSSEC, I'm leaving it disabled until I mess with it again in the AM

I have to look at the logs in detail because trying this again caused a whole lot of services to restart, failed and restart again and again until they eventually succeeded, VPN's, IPPv6, DDNS, Skynet, SPDMerlin, YazDHCP and so on and so on. Just to clean things up I'll get a reboot in and try again in the AM

Here's where it started, and I got a crash

Crash
Feb 18 00:49:38 Router kernel: Hardware name: Broadcom-v8A (DT)
Feb 18 00:49:38 Router kernel: task: ffffffc02ff3aa40 ti: ffffffc021b38000 task.ti: ffffffc021b38000
Feb 18 00:49:38 Router kernel: PC is at 0xf6f55654
Feb 18 00:49:38 Router kernel: LR is at 0x20eec

Log
Feb 18 00:49:38 Router kernel: potentially unexpected fatal signal 11.
Feb 18 00:49:38 Router kernel: CPU: 1 PID: 22734 Comm: dnsmasq Tainted: P O 4.1.51 #2
Feb 18 00:49:39 Router odhcp6c[2310]: Failed to send DHCPV6 message to ff02::1:2 (Network is unreachable)
Feb 18 00:49:39 Router kernel: [0;33;41m[ERROR pktrunner] runnerUcast_inet6addr_event,187: Could not rdpa_system_ipv6_host_address_table_find ret=-5[0m


If you want the full log with all the stuff that follows I'll DM it to you so I don't have to clean it up...
 
Last edited:
I have to look at the logs in detail because trying this again caused a whole lot of services to restart
Changes made to the WAN page will cause the WAN connection to be restarted, which in turns will restart a whole bunch of services.

If you use addons, such as one that modifies the dnsmasq config or uses a very large list of host entries, it may cause crashes. This isn't something new however, it's been randomly happening to some users for quite some time now. The watchdog will take care of restarting dnsmasq after a minute or two.
 
Changes made to the WAN page will cause the WAN connection to be restarted, which in turns will restart a whole bunch of services.

If you use addons, such as one that modifies the dnsmasq config or uses a very large list of host entries, it may cause crashes. This isn't something new however, it's been randomly happening to some users for quite some time now. The watchdog will take care of restarting dnsmasq after a minute or two.
That’s what I observe, I don’t usually pay to much attention to it. But this time just about everything restarted and I hadn’t seen that before.

A quick reboot out of an abundance of caution never hurt anyone, unless I interrupt my wife’s TV viewing…
 
I dirty installed it over 3004.388.6 on my RT-AX68U.
Casual usage no scripts. No dnsmasq log errors, no problems to report.
Rock solid.
 
One thing I've been noticing, though I haven't isolated it yet. On occasion a site or sites (happened with several sites, even with sndforums.com) will only partially load, after a refresh the rest of the page loads. I have seen it more often on the faster wired PC, the slower speed WiFi devices have ocassionally experienced this as well but not as often. When it happens, it does so for a few minutes to every site you visit and then it clears up and everythings loads up on the first try. There is nothing in the log that I've seen to point to anything specfic, though.

Fortunately, I have WireShark on the wired PC so I can try and see what's going on from the PC's perspective, a least a hint to narrow it down.
Can't say its something with this release of dnsmasq, juat raising awareness if anyone experiences this behavior to keep track of it and when it occurs. It could be anything, but it only started happening with this beta. I initally ignored it thinking the issue was releated to the ISP, or something externally across the network but it's happend to often to be a coincidence across too many devices.
Just got to pin it down to provide some tangible metrics to work with.

Curious if any others have had web pages partially load.
 
The SERVFAIL error is actually DNSSEC doing its job.

DNSSEC prevents a reply from being replaced by a different answer, by using reply signing. When using Cloudflare 1.1.1.2, Cloudflare returns a different answer than what is the real answer, returning 0.0.0.0 instead of the correct IP. Dnsmasq receives this response, it fails DNSSEC validation, and turns the response into a SERVFAIL error.

Then, dnsmasq incorrectly logs it as a resource limit exceed, which is the log entry people are seeing.

I explored the dnsmasq code and couldn't come up with a simple fix, so I will just remove the log entry, or move it to a DEBUG priority to hide it from normal users.
Interesting findings by the PiHole team.

 
Interesting findings by the PiHole team.

Interesting. Seems for those running unbound as their upstream DNS this happens. I have OpenDNS as my upstream and don't see this in my logs after the Pi-hole update that included the dnsmasq fix. Will keep my eyes on that...
 
Interesting. Seems for those running unbound as their upstream DNS this happens. I have OpenDNS as my upstream and don't see this in my logs after the Pi-hole update that included the dnsmasq fix. Will keep my eyes on that...
Interesting. I wouldn’t bother enabling FTL/dnsmasq DNSSEC if I was forwarding to Unbound on the same server. Let Unbound do its thing. Maybe I need to setup a Pi-Hole again to see how it’s going.
 
Interesting findings by the PiHole team.
This is the same thing that I found. If a request results in a SERVFAIL (for instance if DNSSEC validation fails), then it generates that incorrect log entry.
 
Updated builds were uploaded. They contain a fix for the incorrect logging of resource limits when the upstream server failed to resolve.

Please give these new builds a try. Again, please report any failure name resolution.
 
Second Beta applied, turned DNSSEC back on, tried the domains that were logging the failures, did not get anything in the logs like before...

1708372108436.png


Checking whether I still randomly see partial page loads requiring a refresh to load the requested page fully.
Haven't seen that happen anymore though still agressively testing but so far so very good!

Update: DNSSEC back on
1708374065923.png


After enabling DNSSEC and configuring Firefox, now of most of the domains that were failing and causing the problem before, fail get this Yahoo! banner, the others just fail, but in either case nothing shows up in the logs
1708374282848.png


Thank You @RMerlin
 
Last edited:
With the second beta I can't reproduce the "resource limit exceeded" message anymore with toggling the flightmode off/on on my Samsung tablet and I do see the same queries are still being done.
 
I am seeing an awkward error on the 1st alpha that does not occur on the previous standard merlin release. I don't see any relevant changes in the log that might be causing this.

My TP-Link managed switch keeps disconnecting randomly on its copper SFP. I won't get into it too much until I test some more but wanted to see if there were any others experiencing this.

Log
Code:
Feb 19 15:26:18 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:19 kernel: ^[[0;30;103mWarning: Serdes at 7 link does not go up following external copper PHY at 19.^[[0m
Feb 19 15:26:20 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:22 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:25 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:28 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:34 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:37 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:39 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 5000 mbps full duplex
Feb 19 15:27:46 rc_service: cfg_server 3142:notify_rc update_sta_binding

Reverting to the previous build seems to resolve this, but I will test the newest alpha today as well.

I have noticed no other issues besides this.

Any ideas?
 
I am seeing an awkward error on the 1st alpha that does not occur on the previous standard merlin release. I don't see any relevant changes in the log that might be causing this.

My TP-Link managed switch keeps disconnecting randomly on its copper SFP. I won't get into it too much until I test some more but wanted to see if there were any others experiencing this.

Log
Code:
Feb 19 15:26:18 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:19 kernel: ^[[0;30;103mWarning: Serdes at 7 link does not go up following external copper PHY at 19.^[[0m
Feb 19 15:26:20 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:22 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:25 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:28 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:34 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 10000 mbps full duplex
Feb 19 15:26:37 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN.
Feb 19 15:26:39 kernel: eth5 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link Up at 5000 mbps full duplex
Feb 19 15:27:46 rc_service: cfg_server 3142:notify_rc update_sta_binding

Reverting to the previous build seems to resolve this, but I will test the newest alpha today as well.

I have noticed no other issues besides this.

Any ideas?
Unrelated to the dnsmasq change. This seem to indicate your Ethernet port is struggling with staying up at 10 Gbps, and eventually it had to settle on 5 Gbps. To me that sounds like a bad connection, I would try reconnecting both ends of that Ethernet cable, or if it still flaps, replace the cable.
 
Updated to lastest.
Limit exceeded error is gone. For good.
Maybe It's placebo, apps/sites here now loads faster in Samsung Smartphone/Tablet.

Just first impression.
Will keep an eye.
First impression does look good.
1000113015.jpg
 
Unrelated to the dnsmasq change. This seem to indicate your Ethernet port is struggling with staying up at 10 Gbps, and eventually it had to settle on 5 Gbps. To me that sounds like a bad connection, I would try reconnecting both ends of that Ethernet cable, or if it still flaps, replace the cable.
Thanks for the swift reply and all of your hard work!

Do you mean the ASUS port is struggling, or the TP-Link? Despite the 5000 rate change, it actually loses ALL connectivity to the switch.

I tested multiple cables with the same results. Is there any reason you're aware of that would stop this from occurring when I downgraded firmware to the last release?

I am doing some more testing again tonight, but thanks for all you do!
 
Last edited:
Do you mean the ASUS port is struggling, or the TP-Link?
Can be either of them. Both ports will negotiate an appropriate link rate based on the connection quality, so it could be a problem with either ports, or the cable itself.

Try power cycling the switch or using a different port to see if it helps.

In any case, it`s definitely not related to the dnsmasq change in this test build.
 
Status
Not open for further replies.

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top