What's new

DNS failing with heavy UDP traffic

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Great stuff.

First to be clear, the problem really is not DNS itself here, but the ability to make ANY outbound connections. I think that's more or less a given at this point.

This generates a side question: If I only specify a SINGLE DNS over TLS server... do you think Stubby will keep that connection alive? If so, I may be able to keep DNS working at least.

The only connections that work seem to be connections that pre-exist the Validator coming up. As an example, I run ngrok here and ngrok continues to allow incoming connections.

I have an unused fiber connection coming in (we don't know if it's live yet and I am having trouble figuring out how to buy an ONT) but we are hooked to ethernet.

The building is probably using CGNAT. I have great connectivity and I can't ping other routers in the building from my WAN interface. If I look at how CGNAT is structured, that would be a good indicator. "

Greg's suggestions :) :

Using wireguard to contain Validator traffic: We thought about doing a VPN to handle some of the traffic. However, it pushes the problem "out" to another system (DNS receiver). Getting something highbandwidth in the cloud or colocated would cost more and actually that bandwidth may not be as good as mine.

Tricking some Solana traffic into using VPN: I think Solana is probably making thousands of DNS requests. So it's not going to be possible to contain its outbound traffic, via DNS A-and QuadA records (IIUC).

DNS Caching: I think DNS issues may be a red herring. Stubby and other tools are just unable to connect outbound after a short while. I think NAT is failing.

So I have to figure out what kernel parameters to tweak to increase NAT capacity, or figure out how to decrease the UDP / TCP timeouts so that I don't have too many NAT connections open.

Think about this: TCP/IP only has 65,535 ports available to it. Each NAT outbound TCP/IP connection takes up one of those for an etherial source port. I could easily see my router running out of those in this situation. The connection tracker was seeing "15,000". I'm waiting for the validator to come up again- thanks @ColinTaylor for the tip on the Tools/Network status page! I have been using Tomato for years but not up to speed 100% on Merlin yet. Validator is coming up again I'll post the total number of connections.
well, segregate validator traffic is what I really hoped to convey - wallets looking for validators or staking rewards would have to get pointed to you somehow from somewhere; figure that out and you'll see how to solve the problem. but if there's that MUCH traffic looking for validation, what @Morris says is the way: your big ol' bully of a validator needs its own fat pipe, and all your other traffic trying to ride along gets knocked out of its way. HMMMM - you've a fibre AND another connection incoming? pretty clear validator gets fibre and netflix/whatever/etc goes on what you're using now...validator should be making you enough to pay for itself and have enough profit to afford a 2nd internet connection.
(where are you that you can get fibre run to your home without an ONT?)
Stubby/DNS caching is another convo entirely...for the websurfing connection. The crypto connection will follow the solana network protocol defined by the validator code.
(I wish I had been paying more attention when my local crypto friends had their cosmos validator up...then I might have better ideas for you)
 
This is the number of network connections being maintained vs active.

I would like to figure out how to time these out.

@Morris I get your point but I can't make any connections outbound. DNS queries are not the connections I'm having- they just all fail. I rate limited the application for a while; however, what happened is that the Validator started getting behind. It needs to be able to connect at a high rate.



I need to figure out how to make inactive connections drop so that we don't have one kid keeping all the toys for the sake of having them.

I could remove the "optomizer" code in the VM here:

sudo bash -c "cat >/etc/sysctl.d/20-solana-udp-buffers.conf <<EOF
# Increase UDP buffer size
net.core.rmem_default = 134217728
net.core.rmem_max = 134217728
net.core.wmem_default = 134217728
net.core.wmem_max = 134217728
EOF"
sudo sysctl -p /etc/sysctl.d/20-solana-udp-buffers.conf


But I still think it's a fair idea to up that on the router to whatever is possible. Lots of memory available (up to 560Mb anyway).




Screen Shot 2022-04-13 at 10.46.49 AM.png
 
HMMMM - you've a fibre AND another connection incoming?
I don't know.

I have what is practically speaking "gigabit" up and down... inside an apartment building. REALLY lucky.
Each apartment seems to have the ethernet WAN cable and an optical fiber cable... but no ONT. When I asked if I could set up Fiber for myself last year they said no... however I do see this loose fiber cable coming into each apartment. I want an ONT just so I can check to see if it's active. My suspiciion is it's active to the same switch. That could be enough to make this whole thing work of course as I'd just connect the validator by itself to that and keep a control plane connection (I'm using ESX) internally on my merlin router to be able to see/connect to the console. I have dual 10gbe NICs on that machine.

In theory... ! Solana could run its own Zero Trust style connector to advertise itself. Now that you mention its networking protocols I am thinking about how much I don't know there.

I am just trying to figure out how in networking terms to limit impact to the network. As you can see above the router is maintaining over 4000 connections of which it believes only 708 are active. Those 4000 connections seem to be blocking my ability to make more of them.

Greg
 
Note: Solana validator here is "non-voting" so no incoming connections. That's why it seems to work with no port forwarding or other tricks. So @heysoundude no wallets looking for validators. We're just seeing how healthy it can run on my machine. So far very well except for the number of connections.

We are going to limit the number of ports it connects out on to 13 (instead of 2000) but that will only facilitate QOS rule setting- it won't actually limit the number of outbound connections.
 
I don't know.

I have what is practically speaking "gigabit" up and down... inside an apartment building. REALLY lucky.
Each apartment seems to have the ethernet WAN cable and an optical fiber cable... but no ONT. When I asked if I could set up Fiber for myself last year they said no... however I do see this loose fiber cable coming into each apartment. I want an ONT just so I can check to see if it's active. My suspiciion is it's active to the same switch. That could be enough to make this whole thing work of course as I'd just connect the validator by itself to that and keep a control plane connection (I'm using ESX) internally on my merlin router to be able to see/connect to the console. I have dual 10gbe NICs on that machine.

In theory... ! Solana could run its own Zero Trust style connector to advertise itself. Now that you mention its networking protocols I am thinking about how much I don't know there.

I am just trying to figure out how in networking terms to limit impact to the network. As you can see above the router is maintaining over 4000 connections of which it believes only 708 are active. Those 4000 connections seem to be blocking my ability to make more of them.

Greg
I think your best bet is to try to contact the contractor who installed the fibre (to backtrack and determine IF you need an ONT.) Or whomever is responsible for building the data infrastructure of the building, the networking person/people.
My suspicion is : that'll be in whatever place the ethernet runs to, and you'll be connected to the 2.5 or 10 or 40Gbps building network when that trunk is installed/lit up. and then the ethernet in the building will go dark or be migrated to internal comms/monitoring etc. Is it a smart building? (sounds more condo-ish to me than apartment with landlord/management company)

Lots going on for you...

and yes, dig into the solana validator code to see if you can get an indication of what they're doing and how to integrate it into your home network/LAN - it's probably where you should've started
 
Update:
Using VPN director and a VPN endpoint seems to have solved the resource starvation I was having on the LAN. I'm not sure why but I don't know enough about how the TUN interface works. Certainly good news though!

The performance is not great (Validator is getting behind) but at least... unstuck for a while here.

If my "guess" is correct and VPN connections do not use NAT, it could be part of the fix. Of course NAT connection limits could just as easily be the problem upstream in the building's router system. I do know we're not taking out the whole building as I'm able to connect from other apartment.

I appreciate all the help and ideas here- great learning experience and certainly the most interesting routing issue I've had to deal with so far.
 
Ok Interesting @heysoundude ... we're getting back to the wireguard thing now.

Although the VPN applied to the validator fixed my network, the validator can not keep up due to the slowness of the VPN.

Wireguard or Lightway (what ExpressVPN supports) would really speed things up. I am willing to consider switching away from ExpressVPN if a major provider supports wireguard today. It looks like Merlin will support it in the UI soon... and the binaries are already in the router. So if I want to tinker I could use that. Setting up my own wireguard server would only double my problem today.

Learning a heck of a lot more than I thought I would on this issue.

As an aside- a leprechaun sized 4 leaf clover fell into my lap yesterday- the building internet people are coming today. I will tackle them in the morning and figure out the story on the fiber, and push for direct connect.
 
Last edited:
Azire, Nord...It's late and been a long day, but I'm sure there are others who also support WireGuard other than the first 2 from the top of my head.
HOWEVER - you can run your own server peer instance of it on your router and be your own VPN...would you like links to the threads in the Merlin Addons forum or can you find them yourself? (a VPN isn't secure AFAIC and IMHO unless you control both ends of the tunnel - right?)
your building's fibre - is it included in rent/fees? will there be an extra fee for a direct connect? (see what they're offering as far as the plan before you get too far ahead of yourself...you may be better off the way it's going to happen without your input)
 
Looks like I will have to beg to get the fiber connected; so much for that.

As for being my own server, I am not sure what that solves. First, I am double NAT'd so I need a zero trust endpoint like ngrok just to connect. I do use that.

I'm trying to get data out quickly to a service without overwhelming the router with connections and NAT tracking. So far the express VPN / OpenVPN combo is doing that, but it's not as fast as it could be because it's not Lightway or Wireguard. And we see it is "slower" enough to cause the system not to keep up.
 
As for being my own server, I am not sure what that solves. First, I am double NAT'd so I need a zero trust endpoint like ngrok just to connect. I do use that.

I'm trying to get data out quickly to a service without overwhelming the router with connections and NAT tracking. So far the express VPN / OpenVPN combo is doing that, but it's not as fast as it could be because it's not Lightway or Wireguard. And we see it is "slower" enough to cause the system not to keep up.
If what you have set up now is working, but not fast enough, you likely owe it to yourself to look at other ways of achieving the same thing.
WireGuard does basically/exactly the same thing, If I correctly understand the first few paragraphs of what ngrok is from this article (https://danielmiessler.com/study/ngrok/)
...and if you make it to the end of that article, where the notes are, and the one about trusting closed source tools that could be turned evil makes you think twice, you need to spend more time with WireGuard: it is open source.
 
don’t get me wrong… wire guard is a potentially great solution to speed up the VPN … and I will pursue it through a provider.

I think security wise ngrok is more seasoned but it is not the same type of system. I believe in closed source software and in fact I spent half of my life making it. in fact my company is negotiating with them to potentially do a contract. if you read the article from express VPN on why they invented light way it explains that the wire guard folks still feel they need more peer review before it should be accepted as 100% trustable.


I would be using lightway immediately if it were possible. actually we could probably configure it inside the virtual machine is just feeling a little messy.

So.. wireguard is a VPN. ngrok is a zero trust networking agent that can enable connections to anything behind a firewall which allows outbound connections. it is working flawlessly for external access into a double NAT’d network. I would say this: if you are a privacy fanatic and that is your main goal with connectivity just know that bgrok is a United States company; also they must be able to decrypt TLS on their servers before forwarding it so… they could be forced by subpoena to do that on the federal agency’s behalf. You might want to steer clear of it for that reason. At work we are simply using it to allow zero trust networking to internal hosted web applications. there are some other open source versions of the same concept where you could provide the server side and avoid the introspection problem. but it is extremely different from VPN.

in order to enable wireguard I need something running somewhere else to connect to. my best bet would be something like nordvpn. if I set up my own VPN server in a data center, I have to deal with my bandwidth here and also in the data center which I will pay for. paying the yearly VPN subscription cost is much cheaper in comparison.

I will look at Nord. thanks for the tip on that!
 
Last edited:
I don't share your comfort with code, and the WireGuard conservative "where might we be vulnerable?" caution is much more satisfying than "Zero Trust - Double NAT" to me, because it admits that vulnerabilities can lurk in the least suspect of places/entities.
From where I type my replies, I sit less than 10 miles outside of US jurisdiction...no worries there. Kinda happy our two doofus leaders are so wrapped up in themselves that they're going round in circles too self satisfied to pay much attention to things that are much more important than themselves, letting us cut wide paths around them. (It's their overlords we should be concerned with, but I digress)
Technically, every home or office with an internet connection is a datacenter with its own server(router), User terminals...you're the perfect example with a crypto validator attached to your SOHO router with outside connection. You're here to sysadmin and network engineer, right? ;-p
 
The problem is "officially solved" at this point with VPN inside the guest virtual machine.

Without the tunnel, we were getting blocked by the number of outbound connections. Based on the number of connections out of "30,000" listed in the network status, I don't think this was the merlin router but I can't be sure.

Note that initially the VPN on the router and then inside the guest linux vm were not fast enough to keep up with this intensive application. However switching the expressvpn protocol inside the virtual machine to udp/lightway drastically improved performance to the point where it could keep up. The VPN is not being used for privacy here- it's merely a networking solution for the connection issue.
===
In terms of privacy, remember that your VPN provider's job is to encrypt AND decrypt and they could easily provide user specific decrypted egress as required by a subpoena. This is just as true for VPN as it is for zero trust endpoint or anything else.

Also know that pretty much every piece of software, open source or not, is full of security problems that are undiscovered today. Vigilance is critical if you want to keep stuff running on computers.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top