What's new

DNS issues - overloading server

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Frostwolf

New Around Here
Last Saturday I upgraded from 386.1_0 to 386.2_0, by Monday night I was having issues with devices saying no internet or no response from DNS. I did have internet, a router reboot restored internet. Tuesday morning, no internet/dns again, I updated to 386.2_2, issue returned that evening. I swapped out DNS from ADGuard to NextDNS, internet restored.. Next morning, same thing, but with NextDNS I can see how many hits it's doing, something is overloading the DNS, about 6k request in about 30 minutes. Rebooting router fixed it, and my DNS request are back to normal.. This morning around 5am, no one awake, the DNS starts running crazy again, I show about 40k request in 3 hours, this time I can troubleshoot, I start by eliminating devices on the network, one by one I shut pc's and phones down, no change. I then SSH into the router and restart DNSMASQ (service restart_dnsmasq) and the issue stops, DNS back to normal, pulled logs from NextDNS, nothing suspicious, but it seems each entry is repeated about 6 to 8 times .. 3-4 hours later, it's back at it, almost 40k request between 12-4p . There was no one downloading or doing anything over the internet during that time.. (Screenshot below). In the router logs, set to debug since this morning, show at 12:37p, the same ten minute period it started going bonkers, it had this log "Apr 17 12:37:51 kernel: nvram: consolidating space!", going back earlier I found another entry this morning before "Apr 17 04:46:17 kernel: nvram: consolidating space!".. both just minutes before going bananas.. 2 fries short of a happy meal, wacko!

I'm considering wiping the settings and resetting it up, but I have a lot of settings, vpn and reservations.. If I export the settings and import those, I suppose I could reintroduce the issue. So I guess I'm doing that from scratch. Or I need to roll back to 386.1.

But before I don anything else, or reset the settings. Has anyone else got any trouble shooting ideas, or test I could perform before hand? I also noticed another forum where they seem to have the same issue as I'm having. https://help.nextdns.io/t/83hl722/nextdns-issues-with-dot-on-asus-merlin,
incase http link is blocked help . nextdns . io / t/ 83hl722 / nextdns-issues-with-dot-on-asus-merlin

I have DoT setup, DNS is now NextDNS< I like it very much so far. The router is RT-AC68w.

Any help is appreciated. Hope what I said above made sense, pretty exhausted at the moment.

DNS overload.png

3-4 second snapshot of log file
.
DNS snapshot.png
 
Last edited:
You seem to be confused about the firmware versions. 386.2_2 is current Merlin. Rolling back may help but there are security vulnerabilities in the older firmware. The Asus Beta for the AC68U is also stable but lacks features of Merlin.
Always best to choose a DNS server that is geographically close to you if you can. With DoT use at least two upstream resolvers with IPV4 and four with IPV6 (two IPV4 and two IPV6 alternated). I've not used the NextDNS add on for Merlin but I feel the security with Stubby DoT and DNSMASQ DNSSEC is more than sufficient for my needs. Try other DNS servers to find which works for you . Cloudflare security 1.1.1.2 and 1.0.0.2 work well for me.
 
Your right, I missed editing the versions.. I am on the current build "Current Version : 386.2_2", my previous version was 386.2_0 and 386.1_0, the problem appeared on 386.2_0

I've disable DNSSEC for now, I had issues with it and the VPN active at the same time on build 386.1_0

I've already changed out DNS, and I've disabled IPV6 for now. No Addin, just using the DNS IPV4 servers, with DoT.. NextDNS reports quickly if that's working. I may need to go look for the Addin, maybe it'll show was device made each request.
 
In order to minimize the number of queries going upstream:
  • All of your devices should use the router's dnsmasq. I don't want to configure every device so I intercept DNS requests. I disable DNS over HTTPS in browsers.
  • By default, dnsmasq does not cache queries for non-existent entries. I have a script which overrides this which substantially reduces the number of upstream queries, especially from Windows PC's.
  • When I first started using DNS over TLS, some of the providers did not have the capacity yet and many queries would fail. I have been using Quad9 for a while and have never noticed this happening again.
 
  • You don't want the router doing DNSSEC, you want the DNS provider to do that for you and set a DNSSEC bit in the responses. If I take a DNSSEC test, it succeeds even though my router is not doing it.
  • I have IPv6 enabled. I get double the DoT servers round robin'd as a result
  • I do not want to use a vendor written plugin on my router, I want the standard Merlin firmware to handle this
  • I can't imagine a valid reason to send all of your router's traffic over a VPN
 
  • I prefer the Internet test to be a ping rather than a DNS lookup
 
DNSSEC is off on the router, NextDNS is a validating DNSSEC resolver.
only one device using OpenVPN and the VPN DNS, if that tunner goes down, that device is blocked until it's restored.
I'm not using any vendor addin/plugin at all, only features built in to Merlin
a few devices are setup to use DNSFilter, such as Roku and Google Home set to use 8.8.8.8, Hulu will error out if it can't get to the ads.

When this happened yesterday, there was nearly nothing turned on or even hooked up. We were rearranging everything, the only thing hooked up was a Fortinet 60e I use for VPN to work (and isolated from my network), Synology NAS, and a couple of android phones that weren't being used. All the computers were off or unplugged. I didn't even notice it had gone bonkers for 4.5 hours due to moving everything.

NVRAM consolidating space messages in logs, I've been seeing that in the logs a few times, no DNS failures matching the times of those log entries as well.

I've disabled DoS protection in Merlin, so far DNSMASQ is behaving after 12 hours. I'll report back if this seems to fix it.
 
85 hours and 386.2_2 has been stable once I disabled DoS under Firewall.
142 hours in, still no flooding issues since turning off DoS under firewall settings. However trouble with DNS had me restart DNSMASQ to troubleshoot, and it didn't help my current issue. So the issue I opened this thread with appears solved.

My new issue appears to be related to the DNS service itself, so I added Adguard back to my list of DoT servers to fix the new issue.
 
I started having weird DNS issues since I upgraded to 386.2_2. i have ASUS RT-AC5300 behind gigabit ATT router, the Netflix app on FireTV can't connect to 2 of 3 netflix servers, so it won't start, but Netflix App on AppleTV is fine. only solution i've found is to disable "connect to DNS Server Automatically" in WAN, and changed "DNS Server2" to 8.8.8.8
i left the DNS Server1 pointing to the ATT DNS, that i saw in the upstream router has.

also rebooting ASUS router sometimes wasn't getting a working DNS, so my entire home could not connect to internet. Setting one of the DNS Servers to 8.8.8.8 seems to be helping. Disabling DOS under firewall didn't help. Changing DHCP query frequency to "Normal Mode" from "Aggressive Mode" didn't help.

I don't see any fixes to DNS to 386.2_4.

I had another RT-68u setup as AP mode, the 2 routers could not see each other, but I could ssh from my laptop to both. i had to factory reset the RT-68u and configure everything manually to get it to work properly with 386.2_2, restoring saved settings from pre 386.2_2 brought the flakiness back to the RT-68u.

I'll have to byte the bullet and factory reset the RT-AC5300. this is the 1st firmware i've had to do that.
 
Yeah this firmware 386.2_2 has something going on with DNS for sure, it's been less reliable. I still have my DNS split between NextDNS-primary and Adguard-secondary. I think I will have to wipe it to factory and set it back up from scratch as well. I'm going to try one thing first, when I get time,
https://github.com/nextdns/nextdns/wiki/AsusWRT-Merlin and set that up next.

As far as my original issue though, ive not flooded the dns server anymore. The wiped settings could resolve the DoS issue as well.
 
I factory reset my main ASUS RT-AC5300 and another rt-ac68u after upgrading to 386.2_4 and waited. Factory reset did fix couple of my local lan devices that were not able to talk to each other. After waiting a little time I got DNS issue out of the blue again. I'm wondering if previous firmware had better DNS caching, and now some of my local devices are making too many DNS calls hitting the AT&T router, AT&T maybe blocking after too many requests. I've had my pihole turned off for 4months, it blocked some legit services too, i'm going to try adguardhome see if it can filter DNS floods to the AT&T router, and solve my DNS issue without blocking legit streaming services.
 
I factory reset my main ASUS RT-AC5300 and another rt-ac68u after upgrading to 386.2_4 and waited. Factory reset did fix couple of my local lan devices that were not able to talk to each other. After waiting a little time I got DNS issue out of the blue again. I'm wondering if previous firmware had better DNS caching, and now some of my local devices are making too many DNS calls hitting the AT&T router, AT&T maybe blocking after too many requests. I've had my pihole turned off for 4months, it blocked some legit services too, i'm going to try adguardhome see if it can filter DNS floods to the AT&T router, and solve my DNS issue without blocking legit streaming services.
Thanks for the feedback, l'll hold off on factory resetting mine since it didn't help yours. With two different DNS setup under DoT settings, It's been much more stable, I think it's averaging about 80% hit on primary Next DNS, and the rest failing over to Adguard DNS. With the DOS off under firewall, it's not going bonkers on the DNS flood, which you could see in NextDNS charts..

I turned off ai protection for now to test as well.

The script NextDNS provides, i may try that soon, but I neednto confirm DNS filter will still work, if not, then I can't use it at this time.
 
Disabling the DOS attack in Merlin is the first thing to hit and it worked! Please note: Other users have reported similar issues with DOS enabled in these versions. Still others have reported no issues with DOS on. I turned DOS OFF/Disabled for now. I've been watching things related to NextDNS not working for 5 months now.

WRT DNSMASQ, I have struggled with my AX86U/AC86U Merlin setups since ~ January 2021 + NextDNS. Everything had worked perfectly fine with NextDNS for nearly a year as I was an early beta tester in these forums. I ended up switching to QUAD9 for several months b/c of instabilities. Right now I'm back on NextDNS + Merlin with suspects around.

Today, my gut says there is a DNSMASQ issue running with stubby which is called out in the NextDNS setup. I've seen myself and reported DNSMASQ "going crazy" when it's using NextDNS + DOT / Stubby. Sometimes the only way to recover it is to restart DNSMASQ and stubby.

See this thread. -> https://help.nextdns.io/t/60hzjrv?r=y4hfgqm

I can report today that after NOT setting "round_robin_upstreams: 0" and using the merlin default of "round_robin_upstreams: 1" for stubby, my DNS setup has run for ~ 1 week without a DNS issue or family screaming the internet is down again... (knocking on wood). Oliver's monitoring the above thread. IDK you may have hit the same / similar issue? Reads eerily similar with the DNSMASQ going wild. I'm watching all of this 24/7 in open SSH to the router. I'm tailing the logs and running stubby monitor as in:

-> tail -n50 -f /opt/var/log/dnsmasq.log
-> stubby -l

THANKS! Stay safe, stay alive. Peace.
 
Last edited:
I tried Disabling the DOS attack in merlin 1 more time but revert my WAN dns ip1, ip2 back to my ISP (AT&T) dns and rebooted RT-AC5300.

using AT&T DNS i lost internet, none of my devices had internet access,
but I could ssh into the RT-AC5300, and i could still resolve IP's inside the router only

got ip resolved for both of these 2, i have no idea why this can happen, DNS inside router function, but not out to the LAN devices.

ping www.google.com
ping www.yahoo.com

switching to public DNS 1.1.1.1, 8.8.8.8 everything works.

I did notice Both AT&T DNS is having 18%,30% packet loss, while pinging these public
dns's had 0% packet loss

8.8.8.8, 1.1.1.1, 1.0.0.1

1 of the AT&T DNS just before packet loss response time spiked to 36ms,
most DNS ping response time is around 9-11ms

The fastest DNS was google's 8.8.8.8.

I'm going to conclude this may not be the firmware fault completely, it may just not handle DNS packet loss very gracefully. quite shocking AT&T High speed internet plan 1Gigabit having DNS service that looses packets

here's my ping response times to the 2 AT&T dns ip's
you can see, the seq skipping a couple integers.



/jffs/scripts# ping 68.94.157.9
PING 68.94.157.9 (68.94.157.9): 56 data bytes
64 bytes from 68.94.157.9: seq=0 ttl=249 time=11.356 ms
64 bytes from 68.94.157.9: seq=1 ttl=249 time=36.171 ms
64 bytes from 68.94.157.9: seq=3 ttl=249 time=11.772 ms
64 bytes from 68.94.157.9: seq=4 ttl=249 time=11.444 ms
64 bytes from 68.94.157.9: seq=5 ttl=249 time=11.361 ms
64 bytes from 68.94.157.9: seq=6 ttl=249 time=11.247 ms
64 bytes from 68.94.157.9: seq=7 ttl=249 time=12.201 ms
64 bytes from 68.94.157.9: seq=9 ttl=249 time=11.109 ms
64 bytes from 68.94.157.9: seq=10 ttl=249 time=10.872 ms
^C
--- 68.94.157.9 ping statistics ---
11 packets transmitted, 9 packets received, 18% packet loss
round-trip min/avg/max = 10.872/14.170/36.171 ms


/jffs/scripts# ping 68.94.156.9
PING 68.94.156.9 (68.94.156.9): 56 data bytes
64 bytes from 68.94.156.9: seq=1 ttl=250 time=9.201 ms
64 bytes from 68.94.156.9: seq=2 ttl=250 time=8.981 ms
64 bytes from 68.94.156.9: seq=4 ttl=250 time=9.147 ms
64 bytes from 68.94.156.9: seq=5 ttl=250 time=10.104 ms
64 bytes from 68.94.156.9: seq=7 ttl=250 time=9.153 ms
64 bytes from 68.94.156.9: seq=8 ttl=250 time=9.184 ms
64 bytes from 68.94.156.9: seq=9 ttl=250 time=9.152 ms
^C
--- 68.94.156.9 ping statistics ---
10 packets transmitted, 7 packets received, 30% packet loss
round-trip min/avg/max = 8.981/9.274/10.104 ms
 
  • You don't want the router doing DNSSEC, you want the DNS provider to do that for you and set a DNSSEC bit in the responses. If I take a DNSSEC test, it succeeds even though my router is not doing it.
  • I have IPv6 enabled. I get double the DoT servers round robin'd as a result
  • I do not want to use a vendor written plugin on my router, I want the standard Merlin firmware to handle this
  • I can't imagine a valid reason to send all of your router's traffic over a VPN
Re DNSSEC,
Post in thread '[Release] Asuswrt-Merlin 384.11 is available'
http://www.snbforums.com/threads/release-asuswrt-merlin-384-11-is-available.56501/post-488647
 
^^^ Hi, I leverage QUAD9 for the two WAN DNS entries which are only used while the router is booting to get the time, etc.. It's been years since I used my ISP's DNS. I have no experience with ATT high-speed internet services b/c they are not available to my location.
Some people also mix QUAD9, Cloudflare, and others into the WAN DNS1/DNS2 so if one is down during a boot, the alternative might be ok.

I'd use what works consistently and reliable for your service / area. High % consistent packet loss is a performance killer.

This screen cap part of my manual NextDNS setup (no client). The rest is in the other sections and control files....
This is a handy QUAD9 reference page for their various IPs and what they do ->
https://support.quad9.net/hc/en-us/articles/360041193212-Quad9-IPs-and-other-settings

1620650650049.png

Stay safe, stay alive. Peace.
 
even after i changed both dns to non-ISP DNS, my router reboot still resulted without internet for 10min.
i overhauled my wan-start script in case my wan scripts was the culprit. Merlin now has a new wan-event script,
the commands inside wan-start being executed before could have caused an error, causing WAN to be unstable. The entware was not mounted yet when wan-start was being called during connected state for interface 0, maybe new firmware doesn't handle exceptions as cleanly.

I kind of configured QUAD9 mixed with adguard instead of pihole with instructions from this link


i've rebooted 3x without issues, and router seems to be up ready quicker, the quickest I can ssh into router, router shows uptime of 1min, other times it's been 5min.

I also increased NVRAM available memory by removing the manual ip assignment to mac in DHCP LAN settings,
and placed them in /jffs/configs/dnsmasq.conf.add instead.

found a nice dhcpstaticlist utility here to help with that.


I'm going to cross my finger, watch and wait now hoping I won't loose internet, or loose DNS.
 
Last edited:
Its been 4 days still no internet/dns issues, I have a hunch the latest firmware 386.2_x needed more
nvram cache memory than the previous firmware. I had a script that easily recreated lan dhcp host/ip static list, even after factory reset I ran it and made nvram cache size too small once more. now my scripts don't populate nvram, but use this file /jffs/config/dnsmasq.conf.add to achieve the same. When I say too small, I mean I couldn't use the web ui lan dhcp server page to add any more hosts.

now i'm going to focus more on the low hanging fruit alexa issues disconnecting from wifi, while the internet for the rest of the house still works.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top