Solved Potential bug with UDP NAT loopback/hairpinning

  • ATTENTION! As of November 1, 2020, you are not able to reply to threads 6 months after the thread is opened if there are more than 500 posts in the thread.
    Threads will not be locked, so posts may still be edited by their authors.
    Just start a new thread on the topic to post if you get an error message when trying to reply to a thread.

Jargon

Occasional Visitor
Hi all, hopefully this is the right place to post this.

I'm running a Valheim server on my Ubuntu box on my network and after the latest game update (where they stopped using SDR/Steam relays) I've suddenly lost the ability to connect to my server using my WAN IP (those outside my network are unaffected). After countless hours on their Discord investigating I've discovered 2 others having exactly the same issues as myself. All of us having Asus routers. I'm running Asuswrt-Merlin on a RT-AC3200 and I believe they are on stock (I'm not sure what router models they are using but they are all different).

We've slowly converged towards the idea that there maybe something wrong with UDP NAT loopback/hairpinning. I run multiple services on my local network and have never had an issue like this before, but Valheim is exclusively UDP whereas all the other services I run are TCP.

I ran some tests with netcat and discovered that when connecting to myself (when logged into the Ubuntu box) using the WAN IP, the connection seems to drop/close after the first message is transmitted. See below:
netcatudp.png


This is as far I've got and I'm at a loss of how to proceed. This goes beyond my networking knowledge so I'm really hoping some of you folks can assist.

Thanks in advance :)
 

ColinTaylor

Part of the Furniture
Interesting observation. I can recreate your issue here. Interestingly, if you wait a few minutes you can successfully send another single line of data.

Try disabling NAT acceleration on your router (I can't test this myself at the moment because of family).
 
Last edited:

ColinTaylor

Part of the Furniture
UPDATE: Yep, just tried disabling NAT acceleration and that fixed the nc problem.
 

Jargon

Occasional Visitor
How do I disable the NAT acceleration?

Is it in the web admin? I couldn't see it when I checked earlier but may have missed it. Is it something I need terminal access for?

Also what are the implications of disabling NAT acceleration? Will I lose performance?
 

ColinTaylor

Part of the Furniture
If your router has the option I would expect to see it under LAN - Switch Control.

Without NAT acceleration your maximum download speed will be restricted I'd guess to something like 300Mbps.
 

Jargon

Occasional Visitor
You are a legend.

Not only does the nc UDP test work but I can now also connect to my local network Valheim server using my WAN IP again (and therefor use the in-game server list for it).

I can't believe you narrowed that down so fast. I still don't understand what exactly NAT acceleration has to do with NAT loopback. If you understand what's going on I'd love to get the gist of it if you don't mind explaining?

I'm just hoping that disabling the acceleration won't impair other functionality of the router. 300Mbps download is something I can live with (as long as that is WAN<->LAN transfer only not internal network transfer speeds).

Thanks! :)
 

ColinTaylor

Part of the Furniture
NAT (or hardware) acceleration is a Broadcom chipset feature that increases the throughput of WAN to LAN traffic by bypassing the router's CPU. This allows the router to achieve near gigabit routing speeds. Without it the traffic has to be processed in software and is limited by the power of the router's CPU (hence my 300Mbps guess).

The problem with NAT acceleration (AFAIK) is it only works for TCP traffic. So in theory the router shouldn't even try to use it for UDP. But because your test showed that you could only send data after the initial connection it sounded like the router was erroneously trying to use NAT acceleration.

So I made an educated(?) guess that it might be connected to NAT acceleration. While you have a workaround it may well be a bug. Whether it's a limitation of the chipset or something that can be fixed in software I don't know.
 
P

podkaracz

Guest
NAT (or hardware) acceleration is a Broadcom chipset feature that increases the throughput of WAN to LAN traffic by bypassing the router's CPU. This allows the router to achieve near gigabit routing speeds. Without it the traffic has to be processed in software and is limited by the power of the router's CPU (hence my 300Mbps guess).

The problem with NAT acceleration (AFAIK) is it only works for TCP traffic. So in theory the router shouldn't even try to use it for UDP. But because your test showed that you could only send data after the initial connection it sounded like the router was erroneously trying to use NAT acceleration.

So I made an educated(?) guess that it might be connected to NAT acceleration. While you have a workaround it may well be a bug. Whether it's a limitation of the chipset or something that can be fixed in software I don't know.

This option is present only on Merlin firmware is there a command to change it on stock?
 

Jargon

Occasional Visitor
Whether it's a limitation of the chipset or something that can be fixed in software I don't know.
So now I understand the problem I've been digging deeper and also discovering at one time there used to be a Merlin NAT loopback option in the router which was removed due to increasing complexity to keep it working. I've actually found someone in the Valheim community running the older version of the firmware and using it fixed his problem.

After looking through the GitHub commits for the project I can see the packets can be marked using iptables causing them to bypass the Broadcom chipset that is the NAT acceleration.

Is it possible to mark all UDP packets in this way to prevent them from going through NAT acceleration based on what you said about only TCP can be NAT accelerated? And how would that be done?
 

ColinTaylor

Part of the Furniture
There's nothing currently in iptables that says "use hardware acceleration". This is something that happens in a closed source kernel model. So it's possible that the difference in behaviour is more dependant on changes that have been made to the kernel module. IIRC the old method used packet marking in the mangle table.

If your Valheim user can compare the output of iptables-save between the old and new firmware he might be able to work out what it was doing. He could then try implementing the old method on the new firmware. But you'd be trying to second-guess what's happening inside that kernel module.
 

Jargon

Occasional Visitor
There's nothing currently in iptables that says "use hardware acceleration".
I was actually thinking the contrary.

There's a lot of stuff in the old Merlin NAT loopback code that resembles the following:
C:
#ifdef CONFIG_BCMWL5
    /* mark connect to bypass CTF */       
    if(nvram_match("ctf_disable", "0")) {
        /* mark 80 port connection */
        if (nvram_match("url_enable_x", "1")) {
            eval("iptables", "-t", "mangle", "-A", "FORWARD",
                 "-p", "tcp", "--dport", "80",
                 "-m", "state", "--state", "NEW", "-j", "MARK", "--set-mark", "0x01");
        }

I couldn't see any code that looks for this mark, only sets it under the assumption that it bypasses NAT acceleration. I am wondering whether there is already a check in the kernel module that looks for this mark and handles it there instead? Or am I seriously off the mark here (pun intended)?
 

ColinTaylor

Part of the Furniture
Yes I think this is what's happening, it's setting bit 1 of the mark to bypass CTF (NAT acceleration). The firmware already has mangle rules that do that for UDP and loopback traffic. So the question is why doesn't it work.
 
Last edited:

Jargon

Occasional Visitor
The firmware already has mangle rules that do that for UDP and loopback traffic.
It does? I didn't see anything like that when I looked... unless you mean this?

I'm a high level developer and this is way beyond my networking knowledge, so I'm having difficulty wrapping my head around a lot of this heh. If I knew more about this area I might feel comfortable experimenting with my router.
 

ColinTaylor

Part of the Furniture
I was looking in my iptables rules rather than trying to reverse engineer the source code. I see this in the mangle table:
Code:
-A FORWARD -s 192.168.1.0/24 -d 192.168.1.0/24 -o br0 -j MARK --set-xmark 0x1/0x1ff
-A FORWARD -p udp -m state --state NEW -j MARK --set-xmark 0x1/0x1ff
Note that I'm using John's firmware which is why the mask is 0x1ff rather than 0x7, possibly reflecting a later change in the way the bitmap is used in the kernel module.

To be honest this is not really something I've paid much attention to. For my own needs everything "just works".
 

eibgrad

Very Senior Member
Instead of messing w/ NAT loopback (which given everything else that's going on in the router these days can be problematic), why not use DNSMasq to map your public IP to your server's local IP.

Code:
address=/myhostname.dynet.com/192.168.1.100

Granted, it limits you to only the use of the domain name, and only one target, but for some ppl that may be sufficient. Or at least provides a workaround.

Or you could even create the mapping on the client device itself using a local hosts file if only needed by that one device.
 

ColinTaylor

Part of the Furniture
Playing with this a bit more I think the problem is that the mark is being applied too late in the flow. It needs to go prior to the routing decision and prior to the DNAT. So it either goes at the beginning of nat/PREROUTING or anywhere in mangle/PREROUTING.

So adding this line seems to fix the problem with the nc test (you would presumably need to use mask 0x7 instead of 0x1ff) :
Code:
iptables -t mangle -A PREROUTING -p udp -m state --state NEW -j MARK --set-mark 0x1/0x1ff

Whether this will break other functions of the router I couldn't guess. I know that for example QoS uses marks extensively.
 

Jargon

Occasional Visitor
I came to exactly the same conclusion using the link to the source code in my previous message but I have no idea where to put it. I tried running it from the command line after enabling SSH and I couldn't seem to modify the iptables (iptables -S kept returning the same rules). I have no idea where to put the rule or how to apply it :D

Additionally, I did see that setting the nvram property ctf_pt_udp to 1 might have the same result looking at the logic in the source code (again linked in my previous message) so I tried that and rebooted. It didn't seem to change anything in regards to the iptables rules so I guess it didn't apply properly or I need to regenerate the rules somehow?
 

ColinTaylor

Part of the Furniture
You would have to specify the mangle table (when using -S) to see the rules there. So either iptables -t mangle -S or iptables-save -t mangle.
 

Jargon

Occasional Visitor
Yes, you are right. The rule needs to be at PREROUTING in the flow. When I set ctf_pt_udp (I honestly don't know what this property relates to or where to set it in the web interface) to 1 and rebooted I could see the rule in FORWARD but it didn't fix the issue, in PREROUTING it does fix the nc issue and also allows me to join my Valheim server again.

I guess the last question is, how do I get this rule to persist?
 

ColinTaylor

Part of the Furniture

Similar threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top