What's new

arp problem between wifi devices on RT-AC88u

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Have you ever performed a full reset on the router? Particularly after flashing back and forth from stock Asus to RMerlin firmware?

If you haven't, that is where to start and begin to troubleshoot properly without worrying/seeing weird interactions from the multiple firmware you've been trying.


 
I truly, very much appreciate the interest, and weirdly I did get some different behaviour from these devices after cold booting them, which I suspect means a warm reboot doesn't actually reset the baseband radio entirely, so there ya go, there's some truth a complete cold boot.

So I tried some more simple experiments. It turns out what the RT-AC86U churns on isn't just arp, its anything broadcast. It also isn't dropping these packets, they sit somewhere after the AC86U's kernel for a while... and then eventually end up in the air.

I have a daemon that produces broadcast traffic, a lot of it: oscd. I use it for other nerdy stuff, but for this case, I quickly cannibalized it to drop broadcast UDP packets with the contents of the current time and a destination port of 6667, and then ran tcpdump on a lot of things around the house at the same time. There's probably a more sockety and pure python way to do this, but I'm far from a native python speaker, so forgive the horribleness.

I also installed tcpdump on the RT-AC86U via entware (after going through so many USB sticks that were either A) dead or B) tiny or C) filled with gum and pocket lint). Anyway here's what I found:

With the the below broadcast generator on a wireless client on the network (verified to be connected to the main RT-AC86U):

Python:
import socket
from time import sleep
from pythonosc import udp_client
from datetime import datetime

client = udp_client.SimpleUDPClient('192.168.1.255',6667)
client._sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)

while True:
    client.send_message('time', datetime.now().strftime("%H:%M:%S"))
    sleep(0.2)

5 packets a second should do it!

Running tcpdump with lines like: tcpdump -i eth6 -A src 192.168.1.115 and dst port 6667

I get dump output with arrival timestamps and easily readable contents that have departure timestamps!

Great, I just re-invented most of the functionality of ping just with UDP and broadcast Woohoo! there's probably a much better way to do this!

A client connected by cat-5 to the other RT-AC86U AiMesh node received the broadcast packets with normal network latency.
A client connected by cat-5 to the primary RT-AC86U received the broadcast packets with normal network latency.
The primary RT-AC86U itself saw the broadcast packets arrive and re-broadcasted them out on all interfaces at normal latency.

Sample good tcpdump output:

Code:
17:52:11.639453 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4U<@.@.`....z......... ..time....,s..17:52:11....
17:52:11.689891 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4UH@.@.`....z......... ..time....,s..17:52:11....
17:52:11.740420 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4UM@.@.`....z......... ..time....,s..17:52:11....

6667 is evidently ircd, huh, who knew?

Sample terrible tcpdump output:
Code:
19:06:26.413801 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!L@.@......s.....1... ..time....,s..19:05:48....
19:06:26.413801 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!Q@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721130 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!T@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!W@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!Y@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B   IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!]@.@......s.....1... ..time....,s..19:05:49....

(I mean yeah, 37s is terrible, this is a LAN! 37s in WAN routing terms is... like 120 times around the planet!)

A client connected by wireless... got every packet, eventually, but with varying rates of delay. I watched the dump for quite some time, left it running while watching TV, doing the laundry, making a sandwich, etc. The latency to any client connected to the RT-AC86U's wireless directly was all over the place. For hours at a time it was at line rate, then the delay slowly drifted up to four minutes! FOUR MINUTES! That means the packets were sitting in a buffer forever! That's just nuts! Meanwhile anything unicast stayed at line rate. Typical delay hung out at a mean of two minutes.

I had initially thought, maybe there's a rate limit in here somewhere without a drop... but after eliminating ALL other broadcast traffic on the network aside from my generator and listener, I got the same wacky variable behaviour. There's no triggering traffic threshold I can determine.

Thinking maybe this was related to the backhaul methodology, I connected the two RT-AC86Us directly with cat-5 and reconfigured them for 1Ge backhaul and was still able to replicate the problem immediately.

I've been up and down the iptables and ebtables on this thing and can't find any part of it that would account for post-routing delay that long. Its definitely nothing before that as... the tcpdump shows the packets exiting at line rate. Unless there's some part of iptables I'm missing, and please, please someone tell me I'm missing something basic, then the broadcast packets are getting stuck in the baseband radio buffer for... a while, and that's a big binary blob driver that we can't poke, right?

I took an old Netgear R6400, made it a dumb AP running a different ESSID and hung it off one of the RT-AC86Us so I could put most of the gadgets around the house on the old Netgear and they could... find each other without minutes of latency in the discovery. This doesn't help the people with laptops who are connected to the RT-AC86U trying to find the house file server.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top