eibgrad
Part of the Furniture
Just wondering if anyone can explain the following. But first, my setup (because it may be part of the problem).
Primary Router:
WAN IP: <public ip>
LAN IP: 192.168.61.1
Network: 192.168.61.0/24
VPN Router (WAN port connected to primary LAN port, Merlin 384.11_0):
WAN IP: 192.168.61.60
LAN IP: 192.168.1.1
Network: 192.168.1.0/24
Nothing fancy. Just two daisy-chained routers, the second of which is acting as VPN router, w/ Accept DNS Configuration set to Exclusive. But I'm also interested in split tunneling on that VPN router. And that's where the trouble starts. DNS resolution stops working when Policy Rules is enabled and I have only the following rule defined.
So I decided to investigate the problem in a methodical manner. Remember, at all times, Accept DNS Configuration is set to Exclusive. And I'm using a shell (ssh) to monitor DNS traffic via connection tracking using the following command.
All this does is dump all connections related to DNS queries every 5 secs.
Scenario #1: Redirect internet traffic = All
Connection tracking output.
You can see the client (192.168.1.7) sending the DNS query to the router (192.168.1.1), and the router in response sending a DNS query to the VPN provider's DNS server (10.13.0.1), with an anticipated reply back to OpenVPN client's IP (10.13.0.58) on the tunnel. So far, so good. Exactly as expected.
Scenario #2: Redirect internet traffic = No
Connection tracking output.
The OpenVPN client assigned IP has changed (from 10.13.0.58 to 10.13.0.14), but otherwise, same results as scenario #1. Things still looking good.
Scenario #3: Redirect internet traffic = Policy Strict (no defined rules)
Connection tracking output.
The first sign of trouble. FWIW, the OpenVPN assigned IP happens to remain 10.13.0.58. Notice all the public DNS queries are now going out the WAN to my primary router (192.168.1.1). I realize that enabling policy routing takes the router itself off the VPN, but it was my assumption that even without any rules defined in Policy Rules, DNS should still be routed over the VPN, not the WAN. FWIW, I checked the contents of /tmp/etc/openvpn/dns/client1.resolv, and it continues to have the VPN's DNS server.
So I now have a DNS leak.
Finally (and this was my ultimate goal) ...
Scenario #4: Redirect internet traffic = Policy Strict (w/ rules)
Connection tracking output.
Now DNS doesn't work at all! And if you examine connection tracking, you can see why. Unlike all the prior dumps, where the client first made a DNS query to the router, than another connection was made by the router to the VPN's DNS server over the VPN's network interface, this time every DNS query sent to the router by the client results in ONE connection. And the anticipated reply shows that connection tracking is expecting the reply from the VPN (10.13.0.1) over the WAN (192.168.61.60)?! That makes no sense. It can't possibly work. That violates reverse-path filtering, which requires all traffic to use the same network interface for incoming and outgoing packets on the same connection. That's why connection tracking is reporting UNREPLIED. FWIW, I dumped reverse-path filtering, and got the following.
So what I see are two problems.
1. Given any of the scenarios above, why should any DNS server ever be accessed over the WAN?
2. Given the presence of that policy rule, why does the router (apparently) attempt to access the DNS server over the VPN but expect the reply over the WAN?
Now to be honest, I've seen all kinds of weird behavior w/ various implementations of PBR on third-party firmware, even when writing my own PBR scripts. I know configuring PBR can be tricky. You can run into all kinds strangeness once you start adding routing tables, ip rules, etc. But this seems like a pretty simple policy rule, and it should work. It's not obvious to me that I've made a configuration error. I'm thinking *maybe* the fact the VPN router isn't the primary router is perhaps causing this problem, but that's just a guess. It's just not practical at the moment to take this VPN router and make it my primary router.
I'm also wondering if perhaps switching to TCP (I'm using UDP at the moment) might make a difference. I know on the server side, the use of UDP can be a problem if the server is bound only to a single network interface. And when that happens, you need to add the multihome directive to the OpenVPN server config. But there is no such directive for the client. And I don't have TCP as an option w/ my current VPN provider. Regardless, it doesn't work properly w/ UDP.
While I'm very familiar w/ PBR on other third party firmware (most of which sucks btw), I'm far less experienced w/ PBR on Merlin, so perhaps I'm missing something obvious.
Any ideas?
Primary Router:
WAN IP: <public ip>
LAN IP: 192.168.61.1
Network: 192.168.61.0/24
VPN Router (WAN port connected to primary LAN port, Merlin 384.11_0):
WAN IP: 192.168.61.60
LAN IP: 192.168.1.1
Network: 192.168.1.0/24
Nothing fancy. Just two daisy-chained routers, the second of which is acting as VPN router, w/ Accept DNS Configuration set to Exclusive. But I'm also interested in split tunneling on that VPN router. And that's where the trouble starts. DNS resolution stops working when Policy Rules is enabled and I have only the following rule defined.
Code:
RouteGoogle 0.0.0.0/0 74.125.0.0/16 VPN
So I decided to investigate the problem in a methodical manner. Remember, at all times, Accept DNS Configuration is set to Exclusive. And I'm using a shell (ssh) to monitor DNS traffic via connection tracking using the following command.
Code:
watch -n5 "cat /proc/net/ip_conntrack | grep -E 'dport=53\s|sport=53\s' | sort -k4"
All this does is dump all connections related to DNS queries every 5 secs.
Scenario #1: Redirect internet traffic = All
Connection tracking output.
Code:
udp 17 5 src=10.13.0.58 dst=10.13.0.1 sport=10064 dport=53 src=10.13.0.1 dst=10.13.0.58 sport=53 dport=10064 mark=0 use=2
udp 17 11 src=10.13.0.58 dst=10.13.0.1 sport=17905 dport=53 src=10.13.0.1 dst=10.13.0.58 sport=53 dport=17905 mark=0 use=2
udp 17 8 src=10.13.0.58 dst=10.13.0.1 sport=18551 dport=53 src=10.13.0.1 dst=10.13.0.58 sport=53 dport=18551 mark=0 use=2
udp 17 6 src=127.0.0.1 dst=127.0.0.1 sport=41117 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=41117 mark=0 use=2
udp 17 22 src=127.0.0.1 dst=127.0.0.1 sport=57761 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=57761 mark=0 use=2
udp 17 12 src=192.168.1.7 dst=192.168.1.1 sport=51772 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=51772 mark=0 use=2
udp 17 9 src=192.168.1.7 dst=192.168.1.1 sport=53141 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=53141 mark=0 use=2
udp 17 5 src=192.168.1.7 dst=192.168.1.1 sport=53398 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=53398 mark=0 use=2
You can see the client (192.168.1.7) sending the DNS query to the router (192.168.1.1), and the router in response sending a DNS query to the VPN provider's DNS server (10.13.0.1), with an anticipated reply back to OpenVPN client's IP (10.13.0.58) on the tunnel. So far, so good. Exactly as expected.
Scenario #2: Redirect internet traffic = No
Connection tracking output.
Code:
udp 17 25 src=10.13.0.14 dst=10.13.0.1 sport=12051 dport=53 src=10.13.0.1 dst=10.13.0.14 sport=53 dport=12051 mark=0 use=2
udp 17 26 src=10.13.0.14 dst=10.13.0.1 sport=17016 dport=53 src=10.13.0.1 dst=10.13.0.14 sport=53 dport=17016 mark=0 use=2
udp 17 29 src=10.13.0.14 dst=10.13.0.1 sport=18681 dport=53 src=10.13.0.1 dst=10.13.0.14 sport=53 dport=18681 mark=0 use=2
udp 17 14 src=127.0.0.1 dst=127.0.0.1 sport=33012 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=33012 mark=0 use=2
udp 17 29 src=127.0.0.1 dst=127.0.0.1 sport=42556 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=42556 mark=0 use=2
udp 17 29 src=192.168.1.7 dst=192.168.1.1 sport=54552 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=54552 mark=0 use=2
udp 17 16 src=192.168.1.7 dst=192.168.1.1 sport=55645 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=55645 mark=0 use=2
udp 17 26 src=192.168.1.7 dst=192.168.1.1 sport=60801 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=60801 mark=0 use=2
The OpenVPN client assigned IP has changed (from 10.13.0.58 to 10.13.0.14), but otherwise, same results as scenario #1. Things still looking good.
Scenario #3: Redirect internet traffic = Policy Strict (no defined rules)
Connection tracking output.
Code:
udp 17 6 src=127.0.0.1 dst=127.0.0.1 sport=40255 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=40255 mark=0 use=2
udp 17 21 src=127.0.0.1 dst=127.0.0.1 sport=46942 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=46942 mark=0 use=2
udp 17 24 src=192.168.1.7 dst=192.168.1.1 sport=50645 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=50645 mark=0 use=2
udp 17 18 src=192.168.1.7 dst=192.168.1.1 sport=53931 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=53931 mark=0 use=2
udp 17 23 src=192.168.1.7 dst=192.168.1.1 sport=56272 dport=53 src=192.168.1.1 dst=192.168.1.7 sport=53 dport=56272 mark=0 use=2
udp 17 22 src=192.168.61.60 dst=192.168.61.1 sport=23764 dport=53 src=192.168.61.1 dst=192.168.61.60 sport=53 dport=23764 mark=0 use=2
udp 17 18 src=192.168.61.60 dst=192.168.61.1 sport=26948 dport=53 src=192.168.61.1 dst=192.168.61.60 sport=53 dport=26948 mark=0 use=2
udp 17 24 src=192.168.61.60 dst=192.168.61.1 sport=51161 dport=53 src=192.168.61.1 dst=192.168.61.60 sport=53 dport=51161 mark=0 use=2
The first sign of trouble. FWIW, the OpenVPN assigned IP happens to remain 10.13.0.58. Notice all the public DNS queries are now going out the WAN to my primary router (192.168.1.1). I realize that enabling policy routing takes the router itself off the VPN, but it was my assumption that even without any rules defined in Policy Rules, DNS should still be routed over the VPN, not the WAN. FWIW, I checked the contents of /tmp/etc/openvpn/dns/client1.resolv, and it continues to have the VPN's DNS server.
Code:
server=10.13.0.1
So I now have a DNS leak.
Finally (and this was my ultimate goal) ...
Scenario #4: Redirect internet traffic = Policy Strict (w/ rules)
Code:
RouteGoogle 0.0.0.0/0 74.125.0.0/16 VPN
Connection tracking output.
Code:
udp 17 4 src=127.0.0.1 dst=127.0.0.1 sport=51249 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=51249 mark=0 use=2
udp 17 19 src=127.0.0.1 dst=127.0.0.1 sport=60562 dport=53 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=60562 mark=0 use=2
udp 17 19 src=192.168.1.7 dst=192.168.1.1 sport=50055 dport=53 [UNREPLIED] src=10.13.0.1 dst=192.168.61.60 sport=53 dport=50055 mark=0 use=2
udp 17 21 src=192.168.1.7 dst=192.168.1.1 sport=53070 dport=53 [UNREPLIED] src=10.13.0.1 dst=192.168.61.60 sport=53 dport=53070 mark=0 use=2
udp 17 28 src=192.168.1.7 dst=192.168.1.1 sport=54993 dport=53 [UNREPLIED] src=10.13.0.1 dst=192.168.61.60 sport=53 dport=54993 mark=0 use=2
udp 17 4 src=192.168.61.60 dst=192.168.61.1 sport=61822 dport=53 src=192.168.61.1 dst=192.168.61.60 sport=53 dport=61822 mark=0 use=2
Now DNS doesn't work at all! And if you examine connection tracking, you can see why. Unlike all the prior dumps, where the client first made a DNS query to the router, than another connection was made by the router to the VPN's DNS server over the VPN's network interface, this time every DNS query sent to the router by the client results in ONE connection. And the anticipated reply shows that connection tracking is expecting the reply from the VPN (10.13.0.1) over the WAN (192.168.61.60)?! That makes no sense. It can't possibly work. That violates reverse-path filtering, which requires all traffic to use the same network interface for incoming and outgoing packets on the same connection. That's why connection tracking is reporting UNREPLIED. FWIW, I dumped reverse-path filtering, and got the following.
Code:
/proc/sys/net/ipv4/conf/all/rp_filter=0
/proc/sys/net/ipv4/conf/aux0/rp_filter=0
/proc/sys/net/ipv4/conf/br0/rp_filter=0
/proc/sys/net/ipv4/conf/default/rp_filter=0
/proc/sys/net/ipv4/conf/dpsta/rp_filter=0
/proc/sys/net/ipv4/conf/eth0/rp_filter=0
/proc/sys/net/ipv4/conf/eth1/rp_filter=0
/proc/sys/net/ipv4/conf/eth2/rp_filter=0
/proc/sys/net/ipv4/conf/ifb0/rp_filter=0
/proc/sys/net/ipv4/conf/ifb1/rp_filter=0
/proc/sys/net/ipv4/conf/lo/rp_filter=0
/proc/sys/net/ipv4/conf/tun11/rp_filter=0
/proc/sys/net/ipv4/conf/vlan1/rp_filter=0
/proc/sys/net/ipv4/conf/vlan2/rp_filter=0
So what I see are two problems.
1. Given any of the scenarios above, why should any DNS server ever be accessed over the WAN?
2. Given the presence of that policy rule, why does the router (apparently) attempt to access the DNS server over the VPN but expect the reply over the WAN?
Now to be honest, I've seen all kinds of weird behavior w/ various implementations of PBR on third-party firmware, even when writing my own PBR scripts. I know configuring PBR can be tricky. You can run into all kinds strangeness once you start adding routing tables, ip rules, etc. But this seems like a pretty simple policy rule, and it should work. It's not obvious to me that I've made a configuration error. I'm thinking *maybe* the fact the VPN router isn't the primary router is perhaps causing this problem, but that's just a guess. It's just not practical at the moment to take this VPN router and make it my primary router.
I'm also wondering if perhaps switching to TCP (I'm using UDP at the moment) might make a difference. I know on the server side, the use of UDP can be a problem if the server is bound only to a single network interface. And when that happens, you need to add the multihome directive to the OpenVPN server config. But there is no such directive for the client. And I don't have TCP as an option w/ my current VPN provider. Regardless, it doesn't work properly w/ UDP.
While I'm very familiar w/ PBR on other third party firmware (most of which sucks btw), I'm far less experienced w/ PBR on Merlin, so perhaps I'm missing something obvious.
Any ideas?
Last edited: