What's new

Solved How to fail over to a 2nd VPN provider with killswitch functionality?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

DTS

Regular Contributor
I found some discussion on this topic here: http://www.snbforums.com/threads/mu...h-kill-switch-functionality.74557/post-715458
In your case, if your intent is to have a "failover" OpenVPN client, you could activate OpenVPN client #1 and #2 w/ identical policy routing rules, but only enable the kill switch for OpenVPN client #2. If OpenVPN client #1 fails (or you intentionally stop it), those same clients simply get rerouted over to OpenVPN client #2. If it fails/stops as well, they get blocked from the WAN.

That describes my goal. However, I'm not sure how this is accomplished in conjunction with the killswitch script. I am using the internet killswitch script version: 1.1.2, 20-nov-2021 by @eibgrad. I am not currently enabling the Gui killswitch option.

I'm using firmware version 386.3_2 and the reason I am not using the new built-in killswitch is because I noticed the new built-in killswitch never actually blocked any traffic. I read the explanation given here: http://www.snbforums.com/threads/kill-switch-doesnt-work.74948/post-717509

Note, this is a particularly important change for anyone who's running Merlin and the OpenVPN client(s) on a secondary router, daisy-chained behind the primary router. Access to the upstream private network of the primary router is normally possible w/ the built-in kill switch since what is actually blocked is the default gateway associated w/ the WAN, NOT the WAN itself. But my script *does* block the WAN, and so access to the upstream private network requires a VPN Director rule to that effect, and this update to the script.

I interpreted that to mean that the built-in killswitch will not work in my case -- i.e., my router is a DHCP client of my ISP's modem. So, first question, did I understand that correctly?

Assuming I did, what is the right way to implment failover from VPN provider #1 to provider #2 together with killswitch script functionality when the router is a DHCP client of my ISP's modem?

Extra background, if needed:

I use two different openvpn client configurations. (ExpressVPN is currently my main provider, and PureVPN is the secondary.) For each one, I started using the ovpn remote-random config suggested by @eibgrad in this post. Each ovpn config has about 5 different remotes configured.

I'm currently using "Yes (all)" for "redirect internet traffic through tunnel" and I do not have any VPN Director rules at the moment.
 
Last edited:
I interpreted that to mean that the built-in killswitch will not work in my case -- i.e., my router is a DHCP client of my ISP's modem. So, first question, did I understand that correctly?

That statement is only relevant to a secondary router daisy-chained behind the primary router (presumably one provided by the ISP). In that particular situation, you probably want clients of the secondary router to have access to the primary router's local network. You just don't want them to have access to the internet via the primary router's WAN. And when using the firewall as your kill switch w/ the original script, it did NOT allow access to the primary router's local network. I updated the script so that it did.

Contrast that to a situation where the ISP's device is either only a modem, or is a modem+router but in bridge mode (thus only acting as a modem). None of the above is relevant. There is no need for access over the WAN of your ASUS router by those bound to the VPN.

The built-in kill switch does NOT suffer from this problem because it doesn't actually block the WAN. What it does is deny access to the WAN's default gateway. A subtle but important difference.
 
Thanks for clearing up my confusion on that point.

So far I am unable to get the combination of these things to all work together:

1. the ovpn remote-random config suggested by @eibgrad in this post.
2. two different VPN providers. #1 is ExpressVPN and #2 is PureVPN. Each ovpn config has about 5 different remotes configured as per above.
3. Internet killswitch (either GUI or script)

With these 3 elements in place, it looks like it is going to work initially. Both providers connect and I have Internet access via ExpressVPN (as per Gui VPN status). But after a few minutes, Internet access is killed and ExpressVPN will no longer (re)connect.

Multiple of these lines appears in the logs:

Code:
Dec  2 00:07:17 ovpn-client1[11734]: UDP WRITE [86] to [AF_INET]216.73.162.56:1195: P_CONTROL_HARD_RESET_CLIENT_V2 kid=0 pid=[ #1 ] [ ] pid=0 DATA len=0

ExpressVPN status is connecting.
PureVPN status is connected.
But I have no Internet.

If I stop both of those VPN connections and go back to my "baseline VPN config", I get a working setup again. My "baseline config" is a simple ExpressVPN ovpn file without multiple remotes, and I only use one VPN provider.( In this case I am enabling the GUI killswitch while I test. Both scripts are disabled by "exit 0" as the first line after the shebang.)
 
Before delving into any problems concerning multiple, concurrent OpenVPN clients, be they the same or different providers, I have to ask a fundamental question; are you 100% positive these concurrent OpenVPN clients are NOT generating IP conflicts?

Every time you connect an OpenVPN client, it's introducing a new private IP network to the routing table (e.g., 10.8.0.0/24), and it's NOT all that unusual to find these private networks overlap, either fully or partially. And given you're using multiple remote directives w/ each, it's possible these conflicts may come and go, depending on which servers get connected. So even if you don't have a conflict NOW, you always have to assume it's possible in the future.

That's why anytime something like this doesn't work, it's a good idea to first dump the routing tables and verify you don't have such conflicts.

Code:
ip route
ip route show table ovpnc1
ip route show table ovpnc2
ip route show table ovpnc3
ip route show table ovpnc4
ip route show table ovpnc5
 
  • Like
Reactions: DTS
That's why anytime something like this doesn't work, it's a good idea to first dump the routing tables and verify you don't have such conflicts.

Code:
ip route
ip route show table ovpnc1
ip route show table ovpnc2
ip route show table ovpnc3
ip route show table ovpnc4
ip route show table ovpnc5

Thank you for that tip. That will be my next troubleshooting step tomorrow. (I've been working on this almost non-stop after work for the last three days.)
 
I decided to do a little more troubleshooting tonight. The router had been working correctly for several hours with a single simple ovpn config.

I re-enabled my two providers, each with multiple remotes. Both connected without errors. But very quickly my Internet stopped working. I did not see any obvious errors anywhere, including the system logs (but I need to look at those in more detail later).

What is interesting is that after manually stopping all VPN clients, my router still will not connect to the Internet. My internet is up (I'm writing this message now via another route to the Internet). This makes me suspect that something is triggering the killswitch and the killswitch remains in effect even after I manually stop all VPN clients. Does that make sense? I'm using the 386.3_2 built-in killswitch (only on the last configured client).

How can I check this hypothesis? Current values

Code:
# ip route
default via 192.168.1.254 dev eth0
127.0.0.0/8 dev lo scope link
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.70
192.168.1.254 dev eth0 proto kernel scope link
192.168.100.0/24 dev br0 proto kernel scope link src 192.168.100.1

# ip a
10: eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether <redacted> brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.70/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
20: br0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether <redacted> brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global br0
       valid_lft forever preferred_lft forever

ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
 
As I stated previously, I want to see the relevant routing tables *before* things go south, or at least before it becomes impossible to dump the routing tables. Connect the first OpenVPN client (I assume this is #1) and dump the main and OpenVPN routing table.

Code:
ifconfig tun11
ip route show table main
ip route show table ovpnc1

If all is well, connect the second OpenVPN client (I assume this is #2) and dump the main and that OpenVPN routing table.

Code:
ifconfig tun12
ip route show table main
ip route show table ovpnc2
 
@eibgrad - thanks for those suggestions. I will give you all that output tomorrow. It's pretty late at night here, but I just wanted to quickly say that I have a fully working configuration at the moment. I can't be sure how it will operate when it needs to fail over or invoke the killswitch, but at least I have internet access via my router with 2 VPN providers simultaneously started. That fairly simple step was not working previously.

All I did was remove the multiple remotes, remote-random and server-poll-timeout config lines.

(Using multiple remotes works if I have only one VPN client active.)

I do not see any material differences in the output of ifconfig or ip route between this working config and the non-working one. But I will add the multiple remotes back to the ovpn files and share a much more detailed report now that I have a better understanding of the baseline.
 
Update. Today I am using 5 VPN clients (3 from ExpressVPN and 2 from PureVPN) with VPN Director rules. Each VPN client has only one remote configured. This is working well. The part I could not get working is to define multiple remotes and add "remote-random" to each ovpn config.

I am keen to learn more about routing tables, but before I proceed with this I'm going to take a course on networking fundamentals. I found a good online course which I started this evening. So I'm "closing" this issue until I learn enough to really understand what I'm doing in regard to routing tables.

Thanks for the help, and I'm very happy with the setup I have now. It is much better than my prior setup.

EDIT: as far as the Internet killswitch, I have the built-in killswitch enabled on my last VPN client. I have not fully tested it yet. I did notice during my troubleshooting VPN clients that the built-in killswitch sometimes did not kill the Internet when I expected it to. But there were a lot of moving parts in that testing. Now that I have a stable and working configuration, I'll watch to see how the built-in killswitch works. Will report what I find after some time.
 
I can't imagine why the use of multiple remotes would matter. That just increases the uptime by offering more server options should one or more of the servers be unavailable. Just so long (as I've repeatedly stated) you don't create any IP conflicts on the tunnels.

BTW, you don't *have* to use remote-random w/ multiple remotes. You could remove that one directive and OpenVPN would simply run through the servers sequentially. And you could increase the timeout value as well. I'd be curious if either of those changes made a difference.
 
I can't imagine why the use of multiple remotes would matter. That just increases the uptime by offering more server options should one or more of the servers be unavailable. Just so long (as I've repeatedly stated) you don't create any IP conflicts on the tunnels.

At this point, the only change between a working and non-working setup is the use of multiple remotes.

I did look at ifconfig and the routes as you suggested and I did not see any conflicts.

BTW, you don't *have* to use remote-random w/ multiple remotes.

Good to know.

You could remove that one directive and OpenVPN would simply run through the servers sequentially. And you could increase the timeout value as well.
OK
I'd be curious if either of those changes made a difference.
Now that I started trying to configure a guest network, my router is crashing. Once I have things stable again, I can try these changes. I can also report more details on the routing tables at that time.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top