What's new

Peculiar network problems (possible switching loop?)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

toptoptwo

New Around Here
First of all, the specific problem I am having is that plugging a particular device into the network brings down the entire wired network. Some specific details around the problem are as follows:

The overall network topology looks something like:
network.png


where the main router connected between my ISP and the rest of my network is an ASUS AC56U with Asuswrt-merlin firmware (378.51). The problem occurs when a 2nd onboard NIC on the "PC" connected to "Switch 2" is connected to Switch 2, bringing down the entire wired portion of the network, including the WAN side of the router. Devices can still connect wirelessly to the main router on the 2.4GHz channel but without WAN access. Even if the 1st NIC on the PC is disconnected, the problem remains until the 2nd NIC is disconnected from the network. The 1st NIC is an Intel I217-V onboard ethernet port, and the 2nd (problem) is a Qualcomm Atheros "Killer E2200" port.

Is this creating some kind of loop between that PC and Switch 2, even though the two NICs on that "PC" are separate hardware devices with unique MAC addresses? The problem occurs when the PC is sitting in BIOS configuration, or if OS is loaded but without loading drivers/configuring the device. The 2nd (atheros) ethernet port is actually reserved for and used by a virtual machine guest on that PC, so there can be times when the guest OS is not loaded and the device is unused by the host OS, where we find a situation as described causing an interruption in the entire network. How would I go about beginning to troubleshoot this problem? Thanks for any advice and assistance!
 
Last edited:
Sounds more like the NIC is bad.
Try the following:
Power off the PC and connect ONLY the E2200.
Does this bring down the network?

Update the BIOS/UEFI and NIC drivers for the E2200.
Try again with ONLY the E2200 connected from a cold boot.

Finally, change the IP address of the router from whatever it is now to something else.
Eg. if it is 192.168.1.1 make it 192.168.1.2 (assuming .2 is not in use)

If all else fails I would try to do a wireshark capture and see what packets are causing the network to go down. It may just be the NIC is bad and sending garbage packets.
 
Loop. Are you running Windows 8, 8.1 or server 2012 on it? If not, you must have teaming enable to connect both NICs to the same network as well as link aggregation setup and running on the switch in question. Otherwise you are creating a loop. The OSs that I mentioned support connecting multiple NICs without teaming to the same network (SMB Multichannel + network stack support for all of this).
 
Loop. Are you running Windows 8, 8.1 or server 2012 on it? If not, you must have teaming enable to connect both NICs to the same network as well as link aggregation setup and running on the switch in question. Otherwise you are creating a loop. The OSs that I mentioned support connecting multiple NICs without teaming to the same network (SMB Multichannel + network stack support for all of this).
Not true.
I have 2 nics on my PC and both can be connected to the network with no issue.
I have done this since windows 2000.
Try it yourself and see. All that happens is you get 2 interfaces with different IP addresses.

What you are thinking of is bridging.
Yes, that will do all kinds of bad things to a network.
 
Last edited:
Seems to be some kind of loop to me. Complicated further by the fact that a VM is configured for this NIC.

Is the main router handling DHCP? Is the VM specifying the IP address for that NIC and conflicting with another device somewhere on the network?

Does the issue change if the VM is running?

The answers to the above (and any other additional information you can provide) will be helpful to try to track this down further for you.


See the bolded text (mine) below.

First of all, the specific problem I am having is that plugging a particular device into the network brings down the entire wired network. Some specific details around the problem are as follows:

The overall network topology looks something like:
View attachment 3533

where the main router connected between my ISP and the rest of my network is an ASUS AC56U with Asuswrt-merlin firmware (378.51). The problem occurs when a 2nd onboard NIC on the "PC" connected to "Switch 2" is connected to Switch 2, bringing down the entire wired portion of the network, including the WAN side of the router. Devices can still connect wirelessly to the main router on the 2.4GHz channel but without WAN access. Even if the 1st NIC on the PC is disconnected, the problem remains until the 2nd NIC is disconnected from the network. The 1st NIC is an Intel I217-V onboard ethernet port, and the 2nd (problem) is a Qualcomm Atheros "Killer E2200" port.

Is this creating some kind of loop between that PC and Switch 2, even though the two NICs on that "PC" are separate hardware devices with unique MAC addresses? The problem occurs when the PC is sitting in BIOS configuration, or if OS is loaded but without loading drivers/configuring the device. The 2nd (atheros) ethernet port is actually reserved for and used by a virtual machine guest on that PC, so there can be times when the guest OS is not loaded and the device is unused by the host OS, where we find a situation as described causing an interruption in the entire network. How would I go about beginning to troubleshoot this problem? Thanks for any advice and assistance!
 
The problem occurs when the PC is sitting in BIOS configuration, or if OS is loaded but without loading drivers/configuring the device.
Considering this I would lean toward a bad NIC, or possibly even a bad cable or connector. Does the problem occur with just NIC number 2 plugged in without number 1 plugged in? I don't see how it could be a loop issue if the problem occurs with the machine in BIOS without an operating system loaded.
 
Thank you all for the replies.


Loop. Are you running Windows 8, 8.1 or server 2012 on it? If not, you must have teaming enable to connect both NICs to the same network as well as link aggregation setup and running on the switch in question. Otherwise you are creating a loop. The OSs that I mentioned support connecting multiple NICs without teaming to the same network (SMB Multichannel + network stack support for all of this).
Primary (host) OS is linux, without any teaming or bridging between the interfaces. The motherboard documentation also states something like hardware teaming is not an option between these two devices, and there are no related configurations between the two in UEFI/BIOS menus, so I don't think anything is happening at the hardware/UEFI level. I'm not sure if this could be related to the problem occurring even when an OS is not loaded or devices aren't configured and active.

Seems to be some kind of loop to me. Complicated further by the fact that a VM is configured for this NIC.

Is the main router handling DHCP? Is the VM specifying the IP address for that NIC and conflicting with another device somewhere on the network?

Does the issue change if the VM is running?

The answers to the above (and any other additional information you can provide) will be helpful to try to track this down further for you.


See the bolded text (mine) below.
The main router is in fact the DHCP server as well. Actually, I should have included this information in the original post, but when the guest VM OS is loaded and actually takes control of the E2200 NIC and loads drivers for it, the problem does not ever occur. It is assigned a unique IP by the main router (I have the main router assigning "static" DHCP addresses configured by MAC address of connecting hosts) and network traffic operates as normal, including the VM and the rest of the network. The problem occurs when the guest VM OS is shut down but the cable is still plugged in. I should also add here, which may or may not be important, is that the issue does not immediately occur. The network may continue to operate normally for a few minutes or even substantially more, before the entire network is interrupted in these situations.
Also, for those familiar with linux/virtual machines, the 2nd NIC is actually held by pci-stub/vfio-pci driver in the host (linux) OS. But this fact probably is not important as these don't impact the fact that it also occurs outside of OS.

Considering this I would lean toward a bad NIC, or possibly even a bad cable or connector. Does the problem occur with just NIC number 2 plugged in without number 1 plugged in? I don't see how it could be a loop issue if the problem occurs with the machine in BIOS without an operating system loaded.
I actually haven't tried plugging just NIC 2 in by itself, except for after the problem has already occurred. Later when I get home today, I am going to try unplugging port 1 and leaving port 2 plugged in after a fresh boot, and leaving it in UEFI/BIOS configuration to see if it is simply the 2nd NIC causing issues independent of NIC 1.

Also, would it be at all helpful to diagnose if I were to try plugging NIC/port 1 into Switch 2, and the problem NIC/port 2 into Switch 1 to see if the problem still occurs?
 
Last edited:
Not true.
I have 2 nics on my PC and both can be connected to the network with no issue.
I have done this since windows 2000.
Try it yourself and see. All that happens is you get 2 interfaces with different IP addresses.

What you are thinking of is bridging.
Yes, that will do all kinds of bad things to a network.


You can have 2 NICs connected without any kind of teaming or dual NIC s setup if you have good Spanning tree support in your switch. Spanning tree in the switch calculates the loop and changes one NIC to a redundant NIC and puts it in blocking state so if the other NIC goes down the second will automatically be changed to unblocked and the PC will run off the second NIC. My guess is the Spanning tree support is turned off or not working.
 
Last edited:
Some new information:

Ok so the problem is actually INDEPENDENT of the Intel NIC as well as the switch. I was able to reproduce the problem by booting into the BIOS configuration of the PC with only the Atheros NIC plugged in, and cause the entire network interruption. I then tried connecting the problem Atheros NIC directly into the switch on the main router, and again was able to kill the entire network. Have also tried swapping to a different cable, by using the cable that was previously plugged into the Intel NIC and plugging it into the probelm Atheros NIC (same problem occurred).

Now I'm even more baffled how plugging a device into a network can just cause an entire network disruption. Is this simply a bad NIC, that otherwise performs and works perfectly well with drivers loaded? How do I monitor what traffic this NIC could be putting on the network when in BIOS or OS without drivers?
 
Last edited:
Seems to be a bad NIC. Intel is the only reliable ones I'm aware of.
 
Well here's another probably unhelpfiul kicker: disabling the E2200 NIC device from the motherboard's BIOS configuration definitely hides the device from OS, but is still powered up and causes same problem if plugged in. Kinda interesting.

I guess at this point it seems to be a hardware issue with the ethernet port itself. I am going to play around with some ASPM settings in the BIOS.... but I probably will not be able to solve anything.

Thanks all for attempting to help. Please let me know if you know of something that can actually be done.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top