Solved VPN using Exclusive DNS still queries default ISP servers (RT-AC86U + 386.4)

csj97229

Occasional Visitor
I'm scratching my head as to why my VPN-accessible DNS server is not being queried as I'd expect. I've tried using both Exclusive and Strict for the VPN DNS configuration. I'd swear this was working at one point, but perhaps it was just coincidental and my setup isn't really valid. All of the routing is working fine, but name resolution isn't. I searched and found some similar questions, but most people seem to be using "redirect internet traffic through tunnel" or policy rules, neither of which I am using.

Configuration:
  • Two satellite offices for xyz.org, each running an RT-AC86U to handle some arbitrary number of DHCP clients with each router acting as an independent DHCP server
  • One OpenVPN server that also runs dnsmasq to serve DNS queries for a few internal server names with fixed IPs within the VPN
  • Another web server (www.xyz.org) whose DNS records are stored on a 3rd-party DNS server since it is needs to be accessible from the internet
  • The satellite offices are "loosely connected" to VPN... general internet traffic is not routed through the VPN server, only intranet traffic is. One satellite office is on another continent, so the latency penalty would be much too high to route all traffic through the VPN.
  • Satellite offices need to be able to function if the VPN server goes offline. They won't be able to access to the internal servers (obviously), but should be able to find/access www.xyz.org.
  • Clients are all "trusted" so I don't need any special filtering or strict limitations on what they can do.
In short, I just want the internal DNS server on the VPN queried first by clients whenever the VPN client connection is up. If the VPN connection is down, then using the default ISP servers or hard-coded servers is fine.

Screen Shot 2022-01-27 at 10.00.24 PM.png


Problem: The DNS server running on the VPN server side is not being queried for server.xyz.org queries from the clients in the satellite offices, so those clients can't access the internal servers by name. They can access the internal servers by IP address just fine, so all of the routing is working. Also, I can manually force a client to use the VPN DNS server to resolve an internal server IP address successfully, so I know the VPN DNS server is reachable from the clients and can respond to lookups.

I can see that the VPN is pushing the DNS server information to the routers when they connect and I see the internal DNS server IP address in each router's client.resolv file.
% cat /etc/openvpn/client1/client.resolv
server=10.8.0.1

Each compute client on the satellite subnet is correctly configured by DHCP to query the local RT-AC86U box, which is running vanilla Merlin dnsmasq. The /tmp/resolv.dnsmasq file on the router has the ISP dns servers listed first, followed by the one that I want it to use first (10.8.0.1).
% cat /tmp/resolv.dnsmasq
server=71.10.216.1
server=71.10.216.2
server=10.8.0.1

It feels like either I'm doing something stupid (likely) or I futzed a config setting somewhere along the way. Any pointers on what else to try are appreciated. I can envision a few work-arounds, so this isn't critical, but it's bugging me as to why I can't get it working.

Routers are configured as follows:

WAN

Screen Shot 2022-01-27 at 11.27.26 PM.png


LAN (addresses removed)

Screen Shot 2022-01-27 at 11.29.26 PM.png


VPN settings (address removed):

Screen Shot 2022-01-27 at 11.31.55 PM.png
 

eibgrad

Part of the Furniture
Here's the problem as I see it.

When you use Exclusive, the router bypasses DNSMasq. It simply sees the push'd DNS server from the OpenVPN server and DNAT's (redirects) any DNS query from the client to that DNS server. It's intended for those clients that *must* use the VPN, no other option is sanctioned. But if that server goes down or is otherwise unavailable for any reason, there is no backup. It *has* to work 24/7 or else the client loses DNS completely.

When you use Strict, the router *does* use DNSMasq. It adds the push'd DNS server to those already in use by DNSMasq to those obtained over the WAN (whether from the ISP or your custom servers). But there's a bug. Strict is *supposed* to prepend that DNS server, NOT append it. And by design, Strict (which maps down to the strict-order directive in DNSMasq) accesses those servers in-order. And the fact that a given DNS server is accessible but doesn't know of a given domain (and returns NXDOMAIN) does NOT mean it will try another server. NXDOMAIN is a *valid* response! The server would have to FAIL before it attempted to move on to the next server in the list.

Even if Strict worked as intended, it has other flaws. For example, if may not always access the servers in-order, despite claims that it should by the authors of DNSMasq. When monitoring DNS activity on the router, I can *see* multiple DNS servers being accessed, even when the one w/ highest priority (i.e., the first in the list) is known to be working. It's why I don't generally recommend it. It doesn't work the way most ppl think or assume.

Something else to consider as well. In the world of DNSMasq, *all* DNS servers are assumed to be equally qualified to answer a DNS query, *UNLESS* you specifically tell it otherwise. If you add a directive like the following to DNSMasq, now it knows that only the specified server can resolve the specified domain.

Code:
server=/<domain-name>/<server-ip>

Seems to me that's missing item here. If you decided NOT to push 10.8.0.1, leave Accept DNS Configuration on the OpenVPN clients as Disabled, and instead configured DNSMasq on the OpenVPN clients' routers w/ the above directive, then they'd know to only query the OpenVPN server's DNS for xyz.org name resolution. Otherwise, they're using the other servers all the time for general purposes.
 
Last edited:

RMerlin

Asuswrt-Merlin dev
But there's a bug. Strict is *supposed* to prepend that DNS server, NOT append it.
Unless that was changed, dnsmasq reads them in the reverse order, which is why a few years ago I changed it to append rather than prepend.
 

eibgrad

Part of the Furniture
Unless that was changed, dnsmasq reads them in the reverse order, which is why a few years ago I changed it to append rather than prepend.

I heard you (or someone else, I can't remember now) mention that not too long ago. However, when I monitor connection tracking, I can clearly see this is NOT the case. You can see the first server in the list gets used immediately. In fact, if I manually rearrange the servers in the file and force the push'd DNS server to the top, and restart DNSMasq, it finally starts using the OpenVPN server's DNS server.

As I said, strict-order doesn't work as intended anyway. It does NOT guarantee against DNS leaks, not by any stretch. Monitor connection tracking long enough and you'll eventually find it using ALL the servers.

BTW, I'm working on a small utility to make monitoring this sort of DNS activity much easier. I plan to post a tutorial announcing it over the next few days. That's why I'm very familiar w/ this behavior.
 

RMerlin

Asuswrt-Merlin dev
However, when I monitor connection tracking, I can clearly see this is NOT the case.
It's possible then that the dnsmasq author "fixed" that behaviour at some point. I'd have to review the dnsmasq code to confirm it.

As I said, strict-order doesn't work as intended anyway. It does NOT guarantee against DNS leaks, not by any stretch.
The dnsmasq author did mention too at some point that in general, people shouldn't be relying on strict-order, as it might not necessarily perform the way people expect it to.
 

eibgrad

Part of the Furniture
As I said, I don't recommend strict-order for any reasons. It's just goofy. I've never found anyone over all these years that can truly explain its behavior w/ 100% accuracy. Having Strict for Accept DNS Configuration rely on it is really no better than specifying Relaxed at this point. When using Strict, the better option would be to NOT use strict-order and *only* install the push'd DNS server(s) from the OpenVPN server into /tmp/resolv.dnsmasq. That would ensure that option was leakproof (at least to the extent we know it's bound to the VPN). For anyone wanting them merged w/ the ISP's DNS servers (or custom servers), let them use Relaxed.
 
Last edited:

john9527

Part of the Furniture
Unless that was changed, dnsmasq reads them in the reverse order, which is why a few years ago I changed it to append rather than prepend.
That's my understanding as well.....
When I tested 'strict' by dumping the dnsmasq stats it appeared to work as advertised (although that was a while ago).

There's also an nvram only setting on my fork (vpn_reverse_strict) which does what the name implies....
 

eibgrad

Part of the Furniture
P.S. And since we're on the topic, here's another thing I've discovered during development of my utility when it comes to some commercial OpenVPN providers (specifically KeepSolid (aka VPNUnlimited) and FastestVPN, but probably others as well).

These idiots are pushing DNS servers that are NOT within the scope of the tunnel! Here's a dump of the syslog push-reply and ifconfig when using FastestVPN.

Code:
Jan 27 20:29:56 ovpn-client2[10287]: PUSH: Received control message: 'PUSH_REPLY,sndbuf 393216,rcvbuf 393216,redirect-gateway def1,dhcp-option DNS 10.8.8.8,block-outside-dns,route-gateway 10.16.0.1,topology subnet,ping 10,ping-restart 60,socket-flags TCP_NODELAY,ifconfig 10.16.0.31 255.255.0.0,peer-id 25,cipher AES-256-GCM'

Code:
tun12     Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:10.16.0.31  P-t-P:10.16.0.31  Mask:255.255.0.0
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

IOW, the tunnel is 10.16.0.0/16 (which is bad enough, 64k hosts!) and the DNS server is 10.8.8.8! LOL It's NOT going to matter w/ "Route internet traffic through tunnel" set to Yes, but w/ the VPN Director active and the router itself off the VPN, it's now going to route 10.8.8.8 through the WAN! And which of course won't work. And if you're using Strict, you now have a DNS leak since it has no choice but to fallback to the ISP/custom DNS servers.

They don't even have the courtesy to push a corresponding route directive to bind 10.8.8.8 to the tunnel. According to them (see below), that's *my* job! As if I know beforehand what DNS server(s), if any, are going to be pushed.

I got so pissed about it, I complained to their tech support and got the following response.

"Thank you for reaching out to us and for the suggestion.
As we are running a DNS server which is commonly used for all of our protocols, therefor it is local to the IP pool which we have assigned on our protocols. As you mentioned that you need to perform split tunneling, you can perform it by adding static routes and if you want to use our DNS as primary you can just put one route to that out gateway."


IOW, as expected, they ain't gonna do a thing about it. But it illustrates that understanding how DNS works and preventing DNS leaks is a full time job. And why I'm working on this utility to make it more obvious to users when things are NOT working, and WHY (that's how I finally noticed this issue w/ KeepSolid and FastestVPN). It's proving mighty handy.

Now granted, these are cheap, budget VPN providers. But nonetheless, users are using them, then complaining about DNS leaks (assuming they're even aware of it).
 
Last edited:

eibgrad

Part of the Furniture
When using dd-wrt, I normally use the following directive to ignore any pushed DNS servers.

Code:
pull-filter ignore "dhcp-option DNS"

So I don't normally notice how dd-wrt handles it. I just configure my own DNS servers (e.g., Cloudflare) and bind them to the VPN.

However, given what @egc just said, that dd-wrt no longer uses strict-order anyway (something I wouldn't have noticed), I just now configured one of my dd-wrt routers in the lab, and I can see dd-wrt does exactly as I recommended for Merlin; it replaces whatever is in /tmp/resolv.dnsmasq w/ the push'd servers. It does NOT merge them w/ the ISP/custom DNS servers.
 

csj97229

Occasional Visitor
Really appreciate all of the insight provided here. This gives me some things to try out later this evening when the network is less busy.

When you use Exclusive, the router bypasses DNSMasq. It simply sees the push'd DNS server from the OpenVPN server and DNAT's (redirects) any DNS query from the client to that DNS server. It's intended for those clients that *must* use the VPN, no other option is sanctioned. But if that server goes down or is otherwise unavailable for any reason, there is no backup. It *has* to work 24/7 or else the client loses DNS completely.

Does this imply that Exclusive will not take effect unless clients have all internet traffic redirected through the VPN? I see some OUTPUT_DNS entries for udp and tcp in the output of iptables, but I haven't deciphered them yet.

Thanks for the background on Strict.

Something else to consider as well. In the world of DNSMasq, *all* DNS servers are assumed to be equally qualified to answer a DNS query, *UNLESS* you specifically tell it otherwise. If you add a directive like the following to DNSMasq, now it knows that only the specified server can resolve the specified domain.

Code:
server=/<domain-name>/<server-ip>

Seems to me that's missing item here. If you decided NOT to push 10.8.0.1, leave Accept DNS Configuration on the OpenVPN clients as Disabled, and instead configured DNSMasq on the OpenVPN clients' routers w/ the above directive, then they'd know to only query the OpenVPN server's DNS for xyz.org name resolution. Otherwise, they're using the other servers all the time for general purposes.

This sounds promising, and I had started down that path before but got hung up on only wanting it active when VPN is enabled. Will try to experiment with this again.
 
Last edited:

eibgrad

Part of the Furniture
Does this imply that Exclusive will take effect unless clients have all internet traffic redirected through the VPN? I see some OUTPUT_DNS entries for udp and tcp in the output of iptables, but I haven't deciphered them yet.

Exclusive is implemented differently depending on whether you have Yes or VPN Director for "Route internet traffic through tunnel".

As I said, for the VPN Director, it creates DNAT (redirect) rules for anything in policy routing. It has to do this because the idea behind Exclusive is to ONLY affect those bound to the VPN, NOT any client still bound to the WAN. But when you specify Yes, all your LAN clients are bound to the VPN, by definition. So the use of Exclusive doesn't need to use the DNATs (redirects), but just reconfigured DNSMasq to *only* use the VPN provider's DNS servers.

IOW, when you specify Yes, you gain back access to DNSMasq. And it's the one sure way to prevent DNS leaks. There is simply no other route but the VPN for the LAN clients, the router, DNS, etc. But once you split tunnel, NOW you need these DNATs (redirects) to enforce the VPN servers for only those bound to the VPN. But they lose access to DNSMasq in the process.

Frankly, the way Exclusive works when "Route internet traffic through tunnel" is set to Yes, is the same way I believe Strict should work, which ultimately raises the question why Strict should exist at all! To my mind, Exclusive and Strict are synonymous! The only distinction between the two was the idea that w/ Strict, the ISP/custom DNS servers acted as a fallback/failover. But that's been a total failure in terms of results. So I say, get rid of Strict entirely.

I know that's difficult for ppl to accept given how long these options have been around. But honestly; Exclusive, Relaxed, and Disabled are sufficient. Or else call it Strict, Relaxed, and Disabled. Strict is so commonly used elsewhere in the router, ppl probably find it more intuitive.

BTW, I know a lot of this is confusing. ALL this DNS stuff is overly complex. Heck, even I have to constantly verify and reverify they way things work just to keep things straight. I can't imagine what it's like for someone who's just looking at the GUI and trying "guess" how it all works.
 

csj97229

Occasional Visitor
OK, I think this is "solved" now. I'd certainly appreciate any feedback on whether or not there is a Better Way(tm) to accomplish the same task.

For starters, I abandoned the idea of pushing the intranet DNS server to the client routers for now. So I set "Accept DNS Configuration" to "Disabled".

I left "WAN : Forward local domain queries to upstream DNS" set to true since I still want to be able to access the internet-accessible web server when the VPN is down, and I think that breaks if I set this to false. I'm sure I could hard-code its address into each router's settings to get around that, but I don't want even more things to track down and fix if/when its address changes.

I only want the server=/xyz.org/10.8.0.1 to be in the dnsmasq.conf while the VPN is active, so I created a /jffs/scripts/vpnclient1-route-up script that generates a /jffs/configs/dnsmasq.conf.add file with that line and then restarts dnsmasq.

Likewise, to disable that setting when the VPN goes down, I created a /jffs/scripts/vpnclient-route-pre-down script that removes the "add" file and restarts dnsmasq.

It feels a bit kludgy but it all seems to work, although I haven't simulated a real VPN failure yet... curently just toggling the client on/off through the router gui. It's also a lot more "distributed knowledge" than I was hoping for since I need to keep both client router configs in sync and can't control as much from VPN server side as I'd hoped. C'est la vie.

I was a little surprised that more vpn client-related events aren't being generated when the connection goes up/down. The "route-up" and "route-pre-down" are the only two I seem to get. I found a couple of earlier posts that made me think that might be normal for my particular config (split tunneling, DNS config disabled, etc).

Once again, thanks for the pointers. Any further suggestions for improvement are welcome.
 

eibgrad

Part of the Furniture
I hope you didn't replace the route-up and route-pre-down directives used by the router w/ your own directives in the custom config field of the OpenVPN client.

Code:
[email protected]:/tmp/etc/openvpn/client1# cat config.ovpn
...
up 'ovpn-up 1 client'
down 'ovpn-down 1 client'
route-up 'ovpn-route-up'
route-pre-down 'ovpn-route-pre-down'
script-security 2
…

As you can see, there is more than just those two events being managed.

What you should normally do is create a user script in /jffs/scripts called openvpn-event, which allows the router to handle those events for its own needs, then pass control to the openvpn-event script for you to do any additional processing.

 
Last edited:

csj97229

Occasional Visitor
Guess I should have mentioned that I also created the /jffs/scripts/openvpn-event script to dispatch the events to the other scripts, based on examples I found elsewhere on the forum. I didn't touch anything else. Sounds like I did OK. :)

I had added some extra logging to the script to see if anything else was being triggered. I was expecting to get a vpnclient1-up or vpnclient1-down, but I didn't see any. I'll play around with it a bit more and simulate a VPN server failure to see if it handles that cleanly.
 

csj97229

Occasional Visitor
I guess this scheme doesn't work for actual VPN server drops since the client router never (?) triggers the "route-pre-down" event while it continuously tries to reconnect. I was expecting to get an event triggered when the connection goes down and then another some time later when the connection is reestablished.

At this point I think I'm stuck with monitoring the reachability of the VPN server to determine when the connection is down and then triggering my dnsmasq config changes based on that. Or just keep hoping my VPN server never goes off-line at a bad time.
 

RMerlin

Asuswrt-Merlin dev
nd I can see dd-wrt does exactly as I recommended for Merlin; it replaces whatever is in /tmp/resolv.dnsmasq w/ the push'd servers. It does NOT merge them w/ the ISP/custom DNS servers.
Which won't work precisely for what you also wrote: I support up to five clients. Which of these should "win" in having exclusive usage of the resolv.dnsmasq? The last connected client would overwrite any change made by the other clients, breaking everything.

This is why I developed Exclusive mode. It allows each client to play nice with one another even when running at the same time. And Exclusive mode is the configuration I've always recommended for anyone using a public VPN provider. Anyone using a VPN to connect to a remote office - the regular modes are recommended then, so if they push a local domain, then queries will be sent to the appropriate server based on the domain.

The only reason why I still offer the strict option is because some users insisted back in the day that "this is what works best for me" when I was doing a major OpenVPN configuration overhaul.
 
Last edited:

eibgrad

Part of the Furniture
Which won't work precisely for what you also wrote: I support up to five clients. Which of these should "win" in having exclusive usage of the resolv.dnsmasq? The last connected client would overwrite any change made by the other clients, breaking everything.

This is why I developed Exclusive mode. It allows each client to play nice with one another even when running at the same time. And Exclusive mode is the configuration I've always recommended for anyone using a public VPN provider. Anyone using a VPN to connect to a remote office - the regular modes are recommended then, so if they push a local domain, then queries will be sent to the appropriate server based on the domain.

The only reason why I still offer the strict option is because some users insisted back in the day that "this is what works best for me" when I was doing a major OpenVPN configuration overhaul.

My point about dd-wrt was NOT that there wasn't an issue when it came to multiple, concurrent OpenVPN clients. Clearly the use of something equivalent to Exclusive would be worth serious consideration by @egc in order to achieve the same goals. My point was, given the one OpenVPN client, at least he's NOT including the ISP/custom DNS servers when using Strict mode, thus avoiding a DNS leak. Because that's essentially how it behaved prior to him removing strict-order from DNSMasq.

Of course, dd-wrt doesn't use named modes for how it handles DNS servers from the OpenVPN client, so it's hard to compare the two firmwares directly. It just has one pre-defined behavior that is effectively Strict mode w/o the ISP/custom DNS servers.

The one reference I did make to Exclusive had more to do w/ the naming. I was thinking that it might make more sense to name what is now Exclusive, as Strict. Because if we remove the ISP/custom DNS servers from Strict, then in the case of no routing policy (Yes (all)), that's the same as Exclusive under the same conditions. But if routing policy is active, then Strict becomes the current Exclusive behavior under those conditions.

IOW, it's just a means to bring the naming back under control. I was thinking we just eliminate Exclusive as a "name", NOT its behavior. Given the way Strict is interpreted in other parts of the system, I thought it made sense. For example, in DoT (Stubby), Strict is "exclusive" in its behavior. It will NOT fall back to Do53 should TLS fail. If you want that kind of behavior, use Opportunistic. Similarly, Strict for the OpenVPN clients would be "exclusive" in its behavior too. But if you want a more "opportunistic" behavior, use Relaxed.

Now granted, the opportunistic vs. relaxed comparison doesn't quite work, since there's no way to force DNSMasq to use the VPN DNS servers exclusively, but failover to the ISP/custom DNS servers. That's what most ppl assumed was meant by strict-order, which hasn't been worked out in practice. But at least the Strict analogy works.

But that's a minor issue. It's just semantics. I'm NOT advocating any other behavior changes except for Strict.
 

Frost

Occasional Visitor
Exclusive is implemented differently depending on whether you have Yes or VPN Director for "Route internet traffic through tunnel".

As I said, for the VPN Director, it creates DNAT (redirect) rules for anything in policy routing. It has to do this because the idea behind Exclusive is to ONLY affect those bound to the VPN, NOT any client still bound to the WAN. But when you specify Yes, all your LAN clients are bound to the VPN, by definition. So the use of Exclusive doesn't need to use the DNATs (redirects), but just reconfigured DNSMasq to *only* use the VPN provider's DNS servers.

IOW, when you specify Yes, you gain back access to DNSMasq. And it's the one sure way to prevent DNS leaks. There is simply no other route but the VPN for the LAN clients, the router, DNS, etc. But once you split tunnel, NOW you need these DNATs (redirects) to enforce the VPN servers for only those bound to the VPN. But they lose access to DNSMasq in the process.

Frankly, the way Exclusive works when "Route internet traffic through tunnel" is set to Yes, is the same way I believe Strict should work, which ultimately raises the question why Strict should exist at all! To my mind, Exclusive and Strict are synonymous! The only distinction between the two was the idea that w/ Strict, the ISP/custom DNS servers acted as a fallback/failover. But that's been a total failure in terms of results. So I say, get rid of Strict entirely.

I know that's difficult for ppl to accept given how long these options have been around. But honestly; Exclusive, Relaxed, and Disabled are sufficient. Or else call it Strict, Relaxed, and Disabled. Strict is so commonly used elsewhere in the router, ppl probably find it more intuitive.

BTW, I know a lot of this is confusing. ALL this DNS stuff is overly complex. Heck, even I have to constantly verify and reverify they way things work just to keep things straight. I can't imagine what it's like for someone who's just looking at the GUI and trying "guess" how it all works.

Yes, I might be one of those trying to “ guess “ how

The syslog: “Ovpn-client - Warning: You have specified redirect-gateway and redirect-private at the same time(or the same option multiple times). This is not well supported and may lead to unexpected results.”

RT-AX86U version 386.4

VPN Client configured with Exclusive and Yes(all)

No DNS leak

And yes, I do need a good advice if possible?

Regards
 

RMerlin

Asuswrt-Merlin dev
The one reference I did make to Exclusive had more to do w/ the naming. I was thinking that it might make more sense to name what is now Exclusive, as Strict. Because if we remove the ISP/custom DNS servers from Strict, then in the case of no routing policy (Yes (all)), that's the same as Exclusive under the same conditions. But if routing policy is active, then Strict becomes the current Exclusive behavior under those conditions.
Exchanging behaviour between two settings is very bad practice, and will confuse every single person who has for 8+ years known about how these settings behaved.

And people need to start learning to read the documentation, quite frankly. It's explained right on the webui:

1643482672754.png
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top