What's new

Layer 4 VPN load balancing for increased bandwidth

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

doktor-x

New Around Here
Hi all,

tl;dr is it possible to combine two VPN tunnels to increase my download speeds with AsusWRT Merlin?

The situation:
I have a RPI4 set up with NZBGet, which connects to the internet via my RT-AX88U. The connection is routed through a OpenVPN client using VPN Director. NZBGet opens multiple TCP connections to the usenet, each downloading a different file. With this setup i'm averaging at about 15MB/s. When i disable the VPN Director rule i average at about 25MB/s with everything else unchanged. For me it seems that the bottleneck is the VPN tunnel max connection speed as it varies quite a bit (10MB/s to 20 MB/s).

What i'd like to do:
To mitigate this bottleneck i would like to set up a second OpenVPN client to another server of my VPN provider next to the existing one in AsusWRT and load balance the TCP connections based on the source port of the connection using round robin over both VPN tunnels.
For visualization:
desired.PNG




I think that this is currently not an supported option in AsusWRT Merlin, so would it be possible using another RPI as a proxy which load balances the streams and sends it over different network interfaces so i can use VPN Director to send the streams over different VPN servers based on source IP. This would look like this:
workaround.PNG


So,
1. does this make any sense?
2. is this possible?
3. am i missing something obvious why this would not increase my download speed?
4. if this is possible using a proxy server do you have any suggestions with which software stack i could implement the TCP load balancing?

Thanks in advance for any help on this topic!
 
That's a creative idea but, I think the issue you'll run into assuming you're using OVPN is the lack of CPU power to make sense of this setup.

Even going full bore on a 12700K custom router the most I could get on a 1gbps line w/ OVPN topped out at ~600mbps DL. With Wire Guard though I can get closer to wire speed @ 1200mbps+ w/ a whole lot less CPU utilization.

RPI though won't get that fast on WG but, it would be a lot faster than OVPN.
 
I don't think CPU power is going to be a bottleneck in the setup, as all the heavy lifting for the OpenVPN should be done by the RT-AX88U and tests show that it is able of 200Mb/s with OpenVPN (also i'm wondering if these tunnels are single threaded as i always see only one core getting over 70% when downloading through the VPN tunnel and the other 3 cores are almost idling..). And stable 200Mb/s is all i want as this would already be a 66% speed increase.

My VPN provider does currently not support WG and even if it would i'm not sure it would be good enough to satuarate my 400Mb/s line as i think the server load on a single server is too high to give 400Mb/s solely to my connection..

Do you have any idea how this could be implemented? I'm not really finding anything regarding load balancing/routing based on source port to multiple output interfaces for arbitrary source ports and destination IPs..
 
Well, WG will spawn additional resources to saturate the line.

1644031923021.png


Just a snapshot while DL'ing Ubuntu through tor. This hit 400mbps in a few seconds though w/ a PC being the router w/ no bottlenecks due to CPU but, far left is the CPU utilization per kworker instance.

IIRC a RPI4 could potentially hit line speed -
Using this breaks away from the drastic speed reduction of OVPN and opens you to full speed use of your ISP WAN connection.
 
Ah ok, good to know, but currently WG is unfortunately no option.

I just found out that you can create iptables rules based on source port [ranges] (i'm quite new to this deeply tinkering with networks). First guess was define a rule based on first half of ephemeral ports from the client and route them through tunnel 1 and the other half through tunnel 2, unfortunately source ports are not chosen random but rather incrementally, so i will try now to define a rule for every port where every other rule routes to a different tunnel.
 
source is usually random.. DST though can be worked / finessed into a path.

getting some load balancing out of it through might be a bit trickier to do.

This looks interesting as towards the end it looks to be using LACP mode 4 for load balancing the 2 connections.

Looking at this they're sending anything destined for 1194 to the 2 gateways thus doubling the speed.

iptables -t mangle -A OUTPUT -d ${IPV4_IO_PUB} -p udp --dport 1194 -j MARK --set-mark 1000
iptables -t mangle -A OUTPUT -d ${IPV4_IO_PUB} -p udp --dport 1195 -j MARK --set-mark 1001

so, after creating the 2 virtual "tun" interfaces using LACP 4 this appears to work for load balancing. The only issue here is designating which IP's to send through the VPN or creating a different subnet for the hosts needing to always hit the VPN or just tunnel everything. The additional benefit is you can create as many "tun" interfaces you want and add them to the network/interface configuration with ifenslave.

Code:
auto enp3s0 enp8s0 enp9s0 enp11s0 enp12s0 wlp4s0
allow-hotplug enp3s0 enp8s0 enp9s0 enp11s0 enp12s0 wlp4s0

auto bo0
iface bo0 inet dhcp
        bond-mode 4
        bond-miimon 100
        bond-lacp-rate 1
        bond-slaves enp11s0 enp12s0 enp3s0

Here 8/9/11/12 are on my 4-port 5GE card and I split them into LAN/WAN using br0 / bo0.

Since I'm using Nord for my VPN from CLI I don't have to bond tun interfaces as it just creates an additional interface "nordlynx" and I add some rules / options in iptables to enable it to work automatically.

Code:
*nat
-A POSTROUTING -o nordlynx -j MASQUERADE

Code:
0.0.0.0/1 via 10.5.0.2 dev nordlynx
128.0.0.0/1 via 10.5.0.2 dev nordlynx

Code:
nordlynx: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1420
        inet 10.5.0.2  netmask 255.255.255.255  destination 10.5.0.2
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 1000  (UNSPEC)
        RX packets 9539986  bytes 13542216936 (13.5 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2955292  bytes 459055484 (459.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
I also thought source was random but i checked the open ports when downloading and they were incremental (even more weird the ports are always even). Maybe when the same process uses multiple outgoing ports they are incremental and not random. I didn't find anything on changing this behaviour.

I realy appreciate your effort but i only understand half of the blog post, at best..

I now created two new routing tables which use a MARK set via iptables, one for each tunnel. I have a rule for every ephemeral port defined by my client (ports 32768 to 60999) and every other rule gets a different mark set, by this now one half of my requests gets routed through one tunnel and the other half through the other. The major drawback is with this approach that i need 28231 rules to achieve this, which is way too much. So as my client uses 100 parallel connections the idea is to use rules with port ranges with 50 ports each, this will cut down the number of rules to ca. 565 but will still have about the same effect. Also i'll try to drastically cut down the ephemeral port range of the client, this way i could cut it down even further. When i'm done i'll post the necessary commands here if somebody else wants to try this.

Furthermore if the outgoing ports were random this could be done with 2 rules each defining a port range for one half of the ephemeral port range..

Edit:
Prove that it is working
prove.png
 
so after tinkering more i was able to cut it down to 84 rules, but with this setup i was getting slightly less throughput than using one VPN server, but experiencing higher loads on the router. I think this might work better if the chosen ephemeral ports were random so 2 rules would suffice. Or if you can drastically cut down the ephemeral port range (which i can't do with my current setup).

Because of that i will wait for my VPN provider supporting WG, but it still was fun and i learnt a lot.

For anyone interested in what the nat-start script would look like with this setup here you go:

Code:
# create routing tables associated with fwmark
ip rule add fwmark 2 table 2
ip rule add fwmark 3 table 3
# add a default gateway to each table respectively
# the inner command automatically gets the default gateway for the tunnel by looking it up in the routing table which
# gets created when you add the VPN server via the UI, i used the first and second VPN client, i assume the tables follow the scheme 11[1-5] for VPN client 1-5
ip route add default via $(ip route show table 111 | grep default | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*") table 2
ip route add default via $(ip route show table 112 | grep default | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*") table 3
ip route flush cache
# overwrite source ip with VPN client ip so the request finds it's destination
# the inner command automatically gets the client ip from the corresponding interface, i assume it follows the scheme tun1[1-5]
iptables -t nat -A POSTROUTING -o tun11 -j SNAT --to-source $(ip addr show tun11 | grep -o "inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*")
iptables -t nat -A POSTROUTING -o tun12 -j SNAT --to-source $(ip addr show tun12 | grep -o "inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*")

# add a rule which MARKs multiple port ranges with one of the 2 used marks
# replace 192.168.178.40 with the ip of the client you want to route through both tunnels
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport -j MARK --set-mark 2 --sports 32768:32817,32968:33017,33168:33217,33368:33417,33568:33617,33768:33817,33968:34017
# if a mask rule matches return on the next rule so the other rules don't get evaluated
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport --sports 32768:32817,32968:33017,33168:33217,33368:33417,33568:33617,33768:33817,33968:34017 -j RETURN
# do the same but with different port ranges and other mask
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport -j MARK --set-mark 3 --sports 32868:32917,33068:33117,33268:33317,33468:33517,33668:33717,33868:33917,34068:34117
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport --sports 32868:32917,33068:33117,33268:33317,33468:33517,33668:33717,33868:33917,34068:34117 -j RETURN

# continue the schema for all desired ports
(i suggest you write yourself a script if you have many entries, but be aware that -m multipart supports a maximum of 7 port ranges)
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport -j MARK --set-mark 2 --sports 34168:34217,34368:34417,34568:34617,34768:34817,34968:35017,35168:35217,35368:35417
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport --sports 34168:34217,34368:34417,34568:34617,34768:34817,34968:35017,35168:35217,35368:35417 -j RETURN
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport -j MARK --set-mark 3 --sports 34268:34317,34468:34517,34668:34717,34868:34917,35068:35117,35268:35317,35468:35517
iptables -t mangle -A PREROUTING -p tcp -s 192.168.178.40 -m multiport --sports 34268:34317,34468:34517,34668:34717,34868:34917,35068:35117,35268:35317,35468:35517 -j RETURN
# ...

# finally route every traffic that's coming from another port or via udp to one of the servers
iptables -t mangle -A PREROUTING -s 192.168.178.40 -j MARK --set-mark 2
 
It's slower because you over complicated it with marking and all of the hits for what's basically all of your traffic.

If you google "whats my ip" you'll get the VPN IP as a result and not your actual IP since every application uses a dynamic port.

I would condense this down to 2-3 lines forcing all traffic into the tunnel w/ the MASQUERADE under -nat and then do the load balancing with the mode / weight command(s).


Go to the end and there's - ...Load balance across two OpenVPN tunnels?

Dive into the scripts and there might be something interesting to be modified.

As with everything there's 50 different ways to do things but, there's more efficient ways and then there's granular ways to do things. With FW processing it's about finding a happy medium and placing priority rules at the top so they don't have to look through each on to get to the bottom before finding a hit.
 
Something just occurred to me to keep it simple.

Use the bonding in /etc/network/interfaces but for the tunnels

Code:
auto tun1 tun2
allow-hotplug tun1 tun2

auto bo0 iface bo0 inet dhcp
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-slaves tun1 tun1

#Iptable rule under *nat
-A POSTROUTING -o bo0 -j MASQUERADE

This would route all traffic / load balance the OVPN connections w/o all of the marking / rate / weight / etc.
 
tl;dr is it possible to combine two VPN tunnels to increase my download speeds with AsusWRT Merlin?

Run the VPN off the Pi4, not the router, then you only need the single link (the router is the bottleneck here)

Better yet...

I would recommend getting a linux PC, and run your NBZget app on it - that will be faster and more efficient than any of these other approaches.

Shop around, one can find one the 1L class micro desktops with a decent amount of CPU/RAM/Storage for $150-$200 USD on eBay, etc - A Core i3 or i5 is going to simply be much faster than any Pi board will be - close to line speed perhaps even with OVPN - WG is nicer, but not all providers support it, yet...

Sometimes the best solution is the most obvious one...
 
getting a linux PC
I usually go this route as well, but this seemed interesting enough to work through and see the results.

You can get SFF PC's on Amazon NEW all day long for $150 and then build into it and replace the router completely.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top