What's new

Intermittent DNS failure; Reply code: Server failure (2)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

rahzel2

New Around Here
Hi all. I've been using and loving asuswrt-merlin for few years I have a first issue I cannot resolve on my own.

Setup: RT-AC68U initially running 384.18 but I've upgraded to 384.19 and it persists. I normally use Cloudfare 1.1.1.1 over DoT with DNSFilter set to router with Diversion and pixel-tls.

I'm getting intermittent DNS failure. It does not seem to have any specific pattern or specific site. When I run wireshark I get the following:

Code:
Frame 407279: 80 bytes on wire (640 bits), 80 bytes captured (640 bits) on interface \Device\NPF_{AD551495-D38C-4B9C-8B94-A54095DDFE81}, id 0
Ethernet II, Src: ASUSTekC_a2:91:50 (40:16:7e:a2:91:50), Dst: LiteonTe_5e:62:9f (3c:91:80:5e:62:9f)
Internet Protocol Version 4, Src: 192.168.1.1, Dst: 192.168.1.161
User Datagram Protocol, Src Port: 53, Dst Port: 53319
Domain Name System (response)
    Transaction ID: 0xceb3
    Flags: 0x8182 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
    [Request In: 407266]
    [Time: 0.129668000 seconds]

Troubleshooting done:
  • Disabled Diversions
  • Disabled DNSFilter and DoT
  • Set auto-DNS
  • Set alternate DNS server
  • Rebooted router countless times
  • Power cycled for few minutes
I upgraded to 384.19 and:
  • Reset the settings
  • Formatted the JFFS partition
  • Retried all of the above
I cannot find anything in any of the logs, including in the /var/log/dnsmasq.log. Here's the corresponding section of the dnsmasq.log for the above failed DNS lookup (same timestamp):
Code:
Aug 16 07:24:30 dnsmasq[12425]: query[A] translate.google.com from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: cached translate.google.com is <CNAME>
Aug 16 07:24:30 dnsmasq[12425]: forwarded translate.google.com to 208.67.222.220
Aug 16 07:24:30 dnsmasq[12425]: forwarded translate.google.com to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: query[A] translate.google.com from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: forwarded translate.google.com to 208.67.222.220
Aug 16 07:24:30 dnsmasq[12425]: forwarded translate.google.com to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: dnssec-query[DS] google.com to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: reply google.com is no DS
Aug 16 07:24:30 dnsmasq[12425]: validation result is INSECURE
Aug 16 07:24:30 dnsmasq[12425]: reply translate.google.com is <CNAME>
Aug 16 07:24:30 dnsmasq[12425]: reply www3.l.google.com is 172.217.23.206
Aug 16 07:24:30 dnsmasq[12425]: query[A] www3.l.google.com from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: cached www3.l.google.com is 172.217.23.206
Aug 16 07:24:30 dnsmasq[12425]: query[A] ocsp.pki.goog from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: forwarded ocsp.pki.goog to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: dnssec-query[DS] pki.goog to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: dnssec-query[DNSKEY] goog to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: query[A] ocsp.pki.goog from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: dnssec retry to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: reply goog is DNSKEY keytag 56158, algo 8
Aug 16 07:24:30 dnsmasq[12425]: reply goog is DNSKEY keytag 8029, algo 8
Aug 16 07:24:30 dnsmasq[12425]: reply pki.goog is DS keytag 19801, algo 8, digest 2
Aug 16 07:24:30 dnsmasq[12425]: dnssec-query[DNSKEY] pki.goog to 208.67.222.222
Aug 16 07:24:30 dnsmasq[12425]: reply pki.goog is DNSKEY keytag 53646, algo 8
Aug 16 07:24:30 dnsmasq[12425]: reply pki.goog is DNSKEY keytag 19801, algo 8
Aug 16 07:24:30 dnsmasq[12425]: validation result is INSECURE
Aug 16 07:24:30 dnsmasq[12425]: reply ocsp.pki.goog is <CNAME>
Aug 16 07:24:30 dnsmasq[12425]: reply pki-goog.l.google.com is 216.58.201.67
Aug 16 07:24:30 dnsmasq[12425]: query[A] pki-goog.l.google.com from 192.168.1.161
Aug 16 07:24:30 dnsmasq[12425]: cached pki-goog.l.google.com is 216.58.201.67

This particular failure is with a non-1.1.1.1 server, but it looks the same with 1.1.1.1.

The only thing that seems to alleviate the issue is leaving the router off for a little while, but it does not seem to solve it.

Anyone got any idea where to go from here?

Thank you!
 
Cloudfire was hit by a DDOS attack last week. That could be happening again. Try using google dns or open dns and see if it goes away.

I also have a technical issue with cloudfire using IP 1.1.1.1 which is officially for testing which means any ISP or provider is free to do what ever they like with that IP.

Good luck,

Morris
 
This is not specific to Cloufare DNS. It seems to equally happen regardless of the DNS server used.
 
This makes me think your internet link is dropping packets. It could also be a problem with your router. Two long pings with a large payload will tell the story. On windows open a shell and then do:

ping 8.8.8.8 -t -l 1400
Let that run for about a minute. I suspect you will see dropped packets. If you do then try the same command replacing 8.8.8.8 with your routers wan IP. If you drop packets to your wan IP you have a router issue or a bad jumper cable to the router. If you don't drop packets to the wan IP and did drop packets to 8.8.8.8 (Google DNS) contact your ISP as it's a line issue or possibly a bad cable modem or equivalent.

Good luck,

Morris
 
I've ran the ping to 8.8.8.8 for about 5 minutes. No dropped packets. However, while the ping was running, I could see at least one DNS failure in Wireshark, so whatever went wrong happened while the test was running, but it had no impact on the ping test.

I also then ran it against my WAN IP and in 10 minutes of it running on my second monitor I saw only one dropped packet, which I think can be considered an outlier.
 
This is Turing out to be interesting. A bunch of questions:

1) was the test from a wired or wireless host?

2) Is the rate you just observed what you are complaining about or is it much more frequent at times?

3) Dose it stop working for a while and then start again and if the failures are for a while how long?

4) During failure periods do all DNS requests fail?

5) Dose DNS begin to work again if you take no action?

5) How many nodes are on your network?

6) Any gaining or torrent at the time of the failures?

7) Are the failures only on wireless nodes?

8) Are you using any QOS?

That's enough for now. We should be able to find a cause.

Morris
 
I used to see strange behavior when I had AiProtection enabled. I would suddenly hang and get disconnected from SSH to the router, and the router wouldn't ping from the same workstation for a short time afterward. Eventually it would come back in a minute. Wireless wasn't dropping. This is a work laptop, so at first I assumed it was related to all the extra security crapware loaded on it, but after disabling AiProtection it's been fine.
 
I used to see strange behavior when I had AiProtection enabled. I would suddenly hang and get disconnected from SSH to the router, and the router wouldn't ping from the same workstation for a short time afterward. Eventually it would come back in a minute. Wireless wasn't dropping. This is a work laptop, so at first I assumed it was related to all the extra security crapware loaded on it, but after disabling AiProtection it's been fine.

Were you running Cake on that router at the same time as AiProtection? There are clearly bad interactions between the two, at least on the RT-AC86U.

Morris
 
Were you running Cake on that router at the same time as AiProtection? There are clearly bad interactions between the two, at least on the RT-AC86U.

Morris
No, it’s an AC68U. Can’t run Cake. There’s another QoS script I use.
 
No, it’s an AC68U. Can’t run Cake. There’s another QoS script I use.

Flex dose not have the conflict with AiProtection :-}

Morris
 
This is Turing out to be interesting. A bunch of questions:

1) was the test from a wired or wireless host?

2) Is the rate you just observed what you are complaining about or is it much more frequent at times?

3) Dose it stop working for a while and then start again and if the failures are for a while how long?

4) During failure periods do all DNS requests fail?

5) Dose DNS begin to work again if you take no action?

5) How many nodes are on your network?

6) Any gaining or torrent at the time of the failures?

7) Are the failures only on wireless nodes?

8) Are you using any QOS?

That's enough for now. We should be able to find a cause.

Morris
1) From 5GHz wireless. Good signal strength, about 3-4 meters from the router, line of sight between the router and device

2) I would estimate it it is pretty stable at couple percent of all requests I am able to directly observe (the PC where I am running Wireshark). It is something you could probably ignore if you had to. About 3 - 5 times and hour as you browse you get hit with resolution failure in the browser, and you have to reload.

3) No. Individual requests fail, while concurrent ones resolve. I have a set of bookmarks to news sites, about 15 times that I can open in one hit, and 1 or 2 might fail (not every time tho) while the rest resolve with no issues.

4) Yes. Other requests continue to be resolved, I can see this in Wireshark.

5) I have 2 wired and 7 wireless devices normally. I've not observed it on wired devices, but those are Raspberry Pis serving as servers (NAS server and little Ubuntu server for messing around) so I probably don't directly use them enough via WAN to see the issue.

6) I do have torrents on the network. I had considered that it may be the router getting overwhelmed with the number of connections so I limited max number of connections down to 5, and even tried turning off torrents, seems to make no difference.
I'm not familiar with the term gaining in networking context. If this is a typo and you meant gaming, I do game online some and my ping us usually very low and does not fluctuate.

7) As said above, wired nodes are Rspberry Pis which I do not use enough to notice. Any suggestions on how to running a bunch of DNS lookups from a Ubuntu Server? Some sort of DNS stress test script?

8) No QoS. I have 300 down and 25 up and in the household we can't seem to max that out so there's never been much call for QoS.

I used to see strange behavior when I had AiProtection enabled. I would suddenly hang and get disconnected from SSH to the router, and the router wouldn't ping from the same workstation for a short time afterward. Eventually it would come back in a minute. Wireless wasn't dropping. This is a work laptop, so at first I assumed it was related to all the extra security crapware loaded on it, but after disabling AiProtection it's been fine.
I do not use AiProtect at all.

One thing that just occurred to me now is that since I've been having this issue (last week or so) we've had a bit of a heat wave where I live. Room temperature is about 30C, and checking the router CPU temperature is about 78C, which seems to be reaching into the concerning territory. Since leaving the device off for a while seem to marginally improve things (it cools down some), maybe I'm seeing some overheating? Can anyone offer insight if 78C would be concerning number?
This device is few years old and has been rock solid in previous heat waves, but maybe the router could use a can of air treatment?
 
My CPU on a RT-AC-86U is 78c, for silicon this is warm.

The drops you described during the test are not unusual on a wireless network. What you have described could be environmental. Have you tried different channels? Are there of other WiFi devices near your home? Have you looked at the channels using a WiFi monitor on your phone?

Are you near an airport? I have an RT-AC68U that I have collecting dust as it was terrible on the 5-GHz band. I live on the approach to a major US airport and when aircraft were cumming in I'd experience loss of communication similar to what you described. The 2.4 GHz band was fine. If I recall the upper channels worked better yet I still had problems. It was a couple of years ago so the lower channels may have been an issue. Eventually I turned off the 5-GHz band on that router and all was good till I needed more bandwidth on and upgraded to the RT-AC86U which works great. I have taught that it was my unit yet it could be a wide spread problem. If you can, try turning off 2.4-GHz and see if your problems clear. If you must run 5-Ghz try different channels. I never experimented with cutting down on the number of channels used (Try moving from the default of 20/40/80 MHz to 40 or even 20). Also, I never proved it was aircraft, that was an assumption based on observation.

Let us know how it goes.

Good luck!

Morris
 
I do not live near an airport or any sort of RF facility I am aware of. I do, however, live in an apartment block, and there is a decent number of APs around me. I count 12 x 2.4 GHz APs and 4 x 5 GHz APs with decentish signal. I actually normally have the 2.4 GHz completely disabled because I had noticed that at 2.4 GHz I could reach about 80mbps down at most, and sometimes much much less, presumably due to WiFi congestion.
So, I turned it off cause all my devices are 5GHz capable, my downstream is 300 down so using only 80 seems like bit of a waste, and it is surely not helping to keep the 2.4 enabled when there is so much WiFi congestion as it is.
I did do a survey right now and the 4 x 5 GHz APs are on 38, 40 and 44 channels, and mine was on 44 as well. I moved it to 48 (if I go to channels in the 100s my Windows PC can't see it, which is strange).

However, I do not think it is wireless. I'm fairly sensitive to the ping when gaming and mine has always been super solid. I initially had plans to do an ethernet run because I did not want any WiFi shenanigans while online, but when I tested it the ping was super solid. For example, in Overwatch my ping is between 40 - 43 ms, and I don't think I saw it fluctuate outside of that range ever, even when I'm torrenting and other people are watching HD video on other devices. I thorough that in theory I would see some degradation as the 5GHz antenna would struggle to keep up with 7 clients, with multiple using quite a bit of bandwidth, but I have not noticed this. May also be useful to note that I've been at this location with this setup for 2+ years and my experience has been super solid until a week ago.
 
The upper band should work fine. That's a hint. I believe my 68U worked fine for a year and then I started to have the problems. It is possibly one of the other 5-GHz networks is saturating the AIR as there is channel overlap. If you are not familiar take a look at the section called channel planning here:
You described that there are 4 x 5 GHz APs are on 38, 40 and 44 channels, and mine was on 44 as well. I moved it to 48. All 5 networks are sharing the same 80MHz UNIi-1 channel group and are interfering with each other.

If you can connect via a cat-5E jumper and the issues are wireless then the wired station will work great.

How about router placement? Is it in the middle of our apartment? If it's on a wall an exterior wall would be better than adjacent to another apartment as someone could have placed there router on the other side of that wall and that can lead to lots of issues, particularity with frequency overlap as you have.

Another simple thing to try is the upper channels again if there empty. If everyone is winding up on the lower channels there could be something affecting the upper ones in your area.

Try looking at these things and let us know how it goes.

Morris
 
It’s no smoking gun, but in your OP you’re showing 208.67.222.220 as one of your DNS servers, but 222.220 isn’t one of the official OpenDNS IPs as best I can tell. The main ones are 208.67.222.222 · 208.67.220.220. 222.220 still seems to work, but may not be as reliable. Just throwing out ideas based on the data you posted. OpenDNS support for DNSSEC is also relatively new, relatively speaking.

Have you captured DNS traffic on the router using tcpdump to analyze in WireShark? You can install tcpdump via Entware.

EDIT: I now see that 222.220 is their recommended “fourth” DNS server entry, if required.
 
Last edited:
Ok, does not seem to be asuswrt-merlin related at all. Mods, please lock, or preferably delete, this thread.

I went back to Asus stock software, and the issue persisted. It changed character somewhat, with seemingly just general packets being dropped, not just DNS, but it is clear that the issue was ongoing.
I also went back temporarily to my ISP-provided branded modem/router (which was in modem only mode so far) and while I'm still testing, the issue seems to be gone. So, the troubleshooting points firmly to hardware failure with AC68U. I guess that's all the reason I need to upgrade to AX. Local store has ASUS RT-AX56U in stock, might just run and grab it.

Thank you all for troubleshooting help.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top