What's new

RT-AC5300 Bridge, vlan and arp trouble. Running wrt-merlin 386.7_2

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

crisag

Occasional Visitor
Hello friends. Got an issue driving me crazy for a few days already.

I have a RT-AC5300 with merlin 386.7_2. Standard configuration done by the web interface. vlan1 with 4 interfaces and br0 created with vlan1. Other vlans are for WAN. No hosts/PCs outside vlan1.

One of my hosts can ping any machine on vlan1, except for the ip assigned to br0 (bridge). Which is the router IP. So that machine can’t communicate with anything outside the local network.

When pinging it from the router, I get the following using tcpdump:

tcpdump -nnvvv -e -X -i br0 arp

img1.jpg


No response from the host (192.168.0.243/53:58:01:4d:03:02) at least not on the bridge interface (br0).

The bridge is defined with one interface vlan1. But if we check vlan1 with tcpdump we can see ARP requests and responses for the host normally.

tcpdump -nnvvv -e -X -i vlan1 arp

img2.jpg


Then it looks like the bridge is dropping ARP responses when the destination is to the IP assigned to the bridge interface.


img3.jpg


What is going one? Any clues?
 
Can you post the complete unedited output of these two commands:

Code:
robocfg show
brctl show
 
admin@RT-AC5300-7160:/tmp/home/root# robocfg show
Switch: enabled
Port 0: 1000FD enabled stp: none vlan: 2 jumbo: off mac: 00:01:5c:b2:ce:46
Port 1: 1000FD enabled stp: none vlan: 3 jumbo: off mac: 78:e9:cf:c2:dc:10
Port 2: 1000FD enabled stp: none vlan: 1 jumbo: off mac: 4c:ef:c0:58:06:d6
Port 3: 1000FD enabled stp: none vlan: 1 jumbo: off mac: 38:1a:52:4b:5e:e0
Port 4: 10HD enabled stp: none vlan: 1 jumbo: off mac: 00:00:00:00:00:00
Port 5: 1000FD enabled stp: none vlan: 1 jumbo: off mac: 00:91:9e:e6:e5:38
Port 7: 1000FD enabled stp: none vlan: 1 jumbo: off mac: f6:c9:be:ed:3a:7e
Port 8: 1000FD enabled stp: none vlan: 1 jumbo: off mac: 04:92:26:69:71:68
VLANs: BCM5301x enabled mac_check mac_hash
1: vlan1: 2 3 4 5 7 8t
2: vlan2: 0 8t
3: vlan3: 1 8t

admin@RT-AC5300-7160:/tmp/home/root# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.049226697160 yes vlan1
admin@RT-AC5300-7160:/tmp/home/root#
 
Now I just noticed that the host that only talks to others on the same vlan is connected on port 4 and that one has mac 00:00:00:00:00:00. In fact the network card of the PC connected to that port has the mac 53:58:01:4d:03:02.
 
Now I just noticed that the host that only talks to others on the same vlan is connected on port 4 and that one has mac 00:00:00:00:00:00. In fact the network card of the PC connected to that port has the mac 53:58:01:4d:03:02.
Also 10Mb half duplex. Faulty network card, cable or socket? Try moving the cable to another socket.
 
Tested multiple cables. Same thing. Strange is that I can ping any computer in the vlan, except for the router IP address. If it is the card, then it should not be pinging any computer on the network. Same card works ok in a Cisco L3 switch.
 
Downgrade to 386_5.2 it's basically the solution for most of the 7.x versions.
 
ok. same thing with 386_5.2. more info I found. The device is an industrial equipment plugged on the net. And it always responds to arp requests with 50 bytes instead of the usual 46 bytes used by PCs on the net. For some reason, all arp replies with 46 bytes can be seen by tcpdump on the br0 interface, but the ones with 50 bytes get dropped and can only be seen on vlan1 by tcpdump.

The capture below is from a PC and we can read it with tcpdump on br0 without any issues:

Code:
17:06:25.047007 04:92:26:69:71:60 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.201 tell 192.168.0.1, length 28
        0x0000:  0001 0800 0604 0001 0492 2669 7160 c0a8  ..........&iq`..
        0x0010:  0001 0000 0000 0000 c0a8 00c9            ............
17:06:25.049611 00:91:9e:e6:e5:38 > 04:92:26:69:71:60, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 192.168.0.201 is-at 00:91:9e:e6:e5:38, length 46
        0x0000:  0001 0800 0604 0002 0091 9ee6 e538 c0a8  .............8..
        0x0010:  00c9 0492 2669 7160 c0a8 0001 0000 0000  ....&iq`........
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............

This one is from the equipment and can only be seen on vlan1, not on br0.

Code:
15:51:11.997014 04:92:26:69:71:60 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.243 tell 192.168.0.1, length 28
        0x0000:  0001 0800 0604 0001 0492 2669 7160 c0a8  ..........&iq`..
        0x0010:  0001 0000 0000 0000 c0a8 00f3            ............
15:51:12.007186 35:65:e4:24:08:40 > 04:92:26:69:71:60, ethertype ARP (0x0806), length 64: Ethernet (len 6), IPv4 (len 4), Reply 192.168.0.243 is-at 35:65:e4:24:08:40, length 50
        0x0000:  0001 0800 0604 0002 3565 e424 0840 c0a8  ........5e.$.@..
        0x0010:  00f3 0492 2669 7160 c0a8 0001 0101 0600  ....&iq`........
        0x0020:  7542 6be7 013b 8000 0000 0000 0000 0000  uBk..;..........
        0x0030:  0000                                     ..

According to my research:
  • If the ARP message is to be sent in an untagged frame then the frame overhead itself is 18 bytes. That would result in a frame of 28+18=46 bytes without padding. Additional 18 bytes of padding are necessary in this case to bloat the frame to the 64 byte length.
  • If the ARP message is to be sent in an 802.1Q-tagged frame then the frame overhead is 22 bytes, resulting in the total frame size of 28+22=50 bytes. In this case, the padding needs to be 14 bytes long.
So it looks the equipment is sending 802.1Q tagged frames in response to the arp request. I remember reading something about how to make tagged PVID frames "appear" on bridged vlans. Just don't remember where.
 
Last edited:
I'm giving up with this one and replacing the AP by another brand. Why frames that can be seen on the vlan are not present on the bridge is a mistery. It looks like a bug.
 
I continue to have ICMP problems on my AC5300 since version 386.4_0 was released back in January. When it occurs, I stop getting IPv6 RA's. Also, when the problem occurs, I can't ping active IPv4 devices on my network. Reverting back to 386.3_2 fixes the issue.
 
I'm giving up with this one and replacing the AP by another brand. Why frames that can be seen on the vlan are not present on the bridge is a mistery. It looks like a bug.
Vlan tagged frames will be ignored by the router since it doesn't have tagging support on the LAN. There is some custom code you can write to enable it but why is that device sending tagged frames? Sounds like you should be focusing on that device and figuring out what is wrong with it's settings. Unless you get a router with vlan support (or insert a smart switch to remove the tags) you'll probably have the same issue with other brands.
 
...but why is that device sending tagged frames?
It isn't. It's just the way the switch chip works on that generation of routers (unlike HND). It you look at the output of the robocfg command you can see that each physical port is given a VLAN tag. This is so that it can differentiate LAN ports and wireless interfaces from the WAN port and create the LAN bridge interface (br0). Any traffic that exits the bridge and goes to the router has the VLAN tag stripped off.
 
It isn't. It's just the way the switch chip works on that generation of routers (unlike HND). It you look at the output of the robocfg command you can see that each physical port is given a VLAN tag. This is so that it can differentiate LAN ports and wireless interfaces from the WAN port and create the LAN bridge interface (br0). Any traffic that exits the bridge and goes to the router has the VLAN tag stripped off.

My non-HND has the LAN ports in VLAN 1 but untagged, which is effectively just a dumb switch. Understood that internally the Asus uses some VLANs for stuff then strips the tag off (like WAN VLAN 2) but my understanding was the OP was seeing tagged frames come into the LAN switch from one device, which isn't going to work, they'll just get dropped, unless you manually configure that port to be a 802.1q with the VLAN in question defined.

The non-HND do appear to tag VLAN 1 when it goes to the CPU, which is a bit odd since VLAN 1 should never be tagged, but since it is just internal it doesn't really matter, no interoperability issues there.
 
My non-HND has the LAN ports in VLAN 1 but untagged, which is effectively just a dumb switch. Understood that internally the Asus uses some VLANs for stuff then strips the tag off (like WAN VLAN 2) but my understanding was the OP was seeing tagged frames come into the LAN switch from one device, which isn't going to work, they'll just get dropped, unless you manually configure that port to be a 802.1q with the VLAN in question defined.

The non-HND do appear to tag VLAN 1 when it goes to the CPU, which is a bit odd since VLAN 1 should never be tagged, but since it is just internal it doesn't really matter, no interoperability issues there.
He didn't actually capture the traffic from the device itself. He just noticed the 4 byte difference between capturing traffic on vlan1 and capturing it as it exits br0. I went down the same rabbit hole two years ago on my RT-AC68U. From memory, this is normal. I don't think he did a similar comparison on any of the other ports to see if they were the same. Maybe the RT-AC5300 is different.
 
He didn't actually capture the traffic from the device itself. He just noticed the 4 byte difference between capturing traffic on vlan1 and capturing it as it exits br0. I went down the same rabbit hole two years ago on my RT-AC68U. From memory, this is normal. I don't think he did a similar comparison on any of the other ports to see if they were the same. Maybe the RT-AC5300 is different.

Yeah VLAN 1 is tagged toward the CPU so would expect the extra padding inside the router, I'm still interpreting his post toward the middle of the thread as seeing all machines sending 46 byte ARPs/untagged but this one machine sending 50 byte tagged. Who knows though need more clarification. If the machine is in fact sending a VLAN tag on the frames then that would explain the symptoms.
 
Thanks for all the responses. I ended up using a mikrotik switch/router between the device and the ASUS router and everything is working ok. Connecting the device to the ASUS router indeed doesn't work. The magic on the mikrotik is done by forcing a specific PVID for the bridge port where the device is connected. We don't have that option on the ASUS router and I couldn't find a way to load the 802.1q module there. :(

1667437294063.jpeg
 
Thanks for all the responses. I ended up using a mikrotik switch/router between the device and the ASUS router and everything is working ok. Connecting the device to the ASUS router indeed doesn't work. The magic on the mikrotik is done by forcing a specific PVID for the bridge port where the device is connected. We don't have that option on the ASUS router and I couldn't find a way to load the 802.1q module there. :(

View attachment 45163

Yeah your switch is basically just stripping of the 802.1Q tag. This can be done on the Asus too using a script but not via the GUI. You'd just have to figure out what VLAN ID that particular device is sending. 802.1Q is enabled by default and used by the Asus code, and you can use robocfg to change what vlans are associated to what ports, whether they're tagged or not, etc.

I would think stripping the tag would be accomplished by having the particular vlan tagged on the LAN port but untagged going to the CPU, not sure if that would work though. Someone else may know how to pop the tag off using robocfg or some other command.

PVID itself is not what is stripping off the tag, as that only impacts incoming packets with no tag. Something else in the mikrotik must be configured to ignore the tag on untagged/access ports or something. Maybe that "admit all" in conjunction with the PVID tells the switch to strip it.

Is there no option to disable 802.1Q on the device itself?
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top