What's new

Speeding up Gigabit Ethernet

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

thiggins

Mr. Easy
Staff member
If you're like me, you don't get 1000 Mbps, or even 900 Mbps from your gigabit Ethernet. I typically get 600 - 700 Mbps and maybe a 50 Mbps boost from using 4k jumbo frames.

What are your gigabit Ethernet speeds and how do you get near wire-speed?
 
1. Avoid PCI NICs (esp. on machines with high-bandwidth devices on the PCI bus, notably storage controllers).
2. Use a network performance measurement tool with sufficient tweakage options and buffer size to be proven to be able to give uber benchmarks.

E.g. iperf v 1.7

server: iperf -s
client: iperf -c server -l 64k -t 15 -i 3 -r

3. Play with parameters, re-run as needed.
4. Tune NIC options for interrupt moderation, segmentation offload, etc.
5. Sometimes tune TCP window size.
6. Enable jumbo frames for additional tweakage
7. Don't skimp on CPU
8. Don't mistake networking benchmarks with file transfer performance -- file transfers have very different issues and may benefit from different networking tweaks.

E.g., on-board nVIDIA nForce3 / Windows 2000 to Marvell PCIe / XP Home using 9K frames:

F:\tools\bench\iperf>iperf -c 192.168.0.117 -l 64k -t 15 -i 3 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.0.117, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[796] local 192.168.0.147 port 3762 connected with 192.168.0.117 port 5001
[ ID] Interval Transfer Bandwidth
[796] 0.0- 3.0 sec 343 MBytes 958 Mbits/sec
[796] 3.0- 6.0 sec 341 MBytes 954 Mbits/sec
[796] 6.0- 9.0 sec 346 MBytes 967 Mbits/sec
[796] 9.0-12.0 sec 350 MBytes 978 Mbits/sec
[796] 12.0-15.0 sec 348 MBytes 973 Mbits/sec
[796] 0.0-15.0 sec 1.69 GBytes 965 Mbits/sec
[772] local 192.168.0.147 port 5001 connected with 192.168.0.117 port 1106
[ ID] Interval Transfer Bandwidth
[772] 0.0- 3.0 sec 350 MBytes 979 Mbits/sec
[772] 3.0- 6.0 sec 348 MBytes 972 Mbits/sec
[772] 6.0- 9.0 sec 347 MBytes 971 Mbits/sec
[772] 9.0-12.0 sec 350 MBytes 978 Mbits/sec
[772] 12.0-15.0 sec 348 MBytes 972 Mbits/sec
[772] 0.0-15.0 sec 1.70 GBytes 974 Mbits/sec
 
Avoid to many Hops! If you have 20 devices hooked up, don't use 4 switches, use one. Just because it's not a Hub doesn't mean you won't lose anything from Daisy chaining them!

If you do use this method, be sure to hook them directly into the same switch, and not a chain, as that will majorly cut the throughput.

Definitly avoiding PCI cards, PCI-Express and PCI-X (PCI-64bit, log server cards) are your best bets. Most PC's with a built in gigabit NIC these days port it into the PCIe bus, so it's the same thing. PCI cards aren't capable of hitting gigabit speeds even with nothing else on the bus (Though there almost always will be).

Switch hardware also has a lot to do with it, though layer 3 is really when you start to see it.

Also look at your cabling, for those of you that are clutter minded! Winding your cable into a nice neat circle will make it look better, but can cause signal collisions, same as a kink in the cable. For an example of that, thing about the water hose that has a kink in it.

Cable length is another factor, this also goes with the above. It's hard not to "plan a head" and buy the longer cable because you may need it later, but it can cost you in speed if that's what you're really looking for.

There are many, MANY things that can bring your speed down bit by bit, it's just a matter of finding a speed you can live with.
 
Thanks, guys. PCI NIC (Intel Pro 1000 / MT) is probably the bottleneck that I have right now.
 
If you have a chance, you might try that NIC under Vista -- I've seen some surprisingly good generalization-breaking results in such a scenario.
 
Network performance is both hardware, software and computer architecture dependent.

Hardware:
Starting and building an appropriate cable plant is critical to maximizing speed. Since I would assume in this forum we're talking about mostly home/small business usage, good quality 5E or CAT6 cables to minimize crosstalk and otherwise speed inhibiting factors is crucial

Quality NIC cards with appropriate TCP offload, buffering, scaling, etc., capable firmware. Low cost NICs are abundant, and they are just as good as what you pay. Low cost = host based

Independent PCI-X buses or PCI-E is the only way to enjoy enhanced throughput. The PCI bus is too interrupt/IRQ driven. Theoretically, a 32-bit, 66Mhz (3.3v PCI 2.2 spec) bus is capable of driving a Gig of throughput if a single card is inserted. In reality, the bus signaling wasn't architected in such a way to allow this.

Layer 2 switches - again there is a huge disparity between $20 switches at Wal-Mart/Best Buy and $150+ switches from reputable firms. The higher end switches from DLink are pretty good, as well as the low-end HP stuff. There are others that come to mind, but I think you know where I'm going.

Layer 3 switches - $$$ if you want performance

Software:
NIC drivers - since this is the first opportunity the OS has to interface with the hardware, the drivers that go along with the NICs has to be solid and efficient. Different versions of driver will yield different levels of performance on a given NIC. Adjusting buffer sizes, offload, scaling, window sizes, interrupts, etc, is the job of the driver. Preferably all of this takes place without the processor becoming too involved, and latency of the North and South bridge communications is not hampered by "other" activities, USB, IDE, etc, traffic.

TCP/IP stack - FreeBSD still has the fastest IP stack of them all, bar none. Unfortunately, you are always at the mercy of the IP stack of the chosen OS. It responds to signals traveling up and down the stack from the driver, and there's not a lot you can do about it.

Computer architecture - Every packet must, at some time, make it to the processor/memory subsystem for processing, after what seems like an eternity in a packets life. Just like soup, the end result is only as good as what you built in the process. Now, how many hops did the packet have to traverse in the computer itself? Southbridge or Northbridge interface? What is the latency to get in and out of memory to the processor? And then the exact reverse path, through the IP stack, driver, bus, etc.

I currently do backups of my workstations at home, to a home-built, dual attached FreeBSD NAS. Among others, I've benchmarked my ThinkPad at ~600Mbps, and my home-built media server with dual Gig attachment at ~1.25Gbps. Memory to memory transfers on the T-Pad is ~950Mbps and ~1.89Gbps for the media server. I'm disk-bound in real world use, not network-bound, with 1500B packets, and not because I spent a boat load of money, but simply because all the little pieces matter... and eBay!
 
Layer 2 switches - again there is a huge disparity between $20 switches at Wal-Mart/Best Buy and $150+ switches from reputable firms. The higher end switches from DLink are pretty good, as well as the low-end HP stuff. There are others that come to mind, but I think you know where I'm going.

Thanks for the info, Steve.

Can you say some more about this? How have you benchmarked performance of these switches?
 
What's installed on the OS also deals in this as well. If you have quite a few programs expecting network traffic, you have a longer lead time as they all try to see what packets are incoming.

Vlaning also helps to bring up speeds, as when a packet is sent via the network, in open traffic, everything has to stop and listen. With a Vlan network, that traffic isn't seen, so it doesn't cause everyone to stop and look.

VLaning however is more of a layer2+ deal, so that's a lot of what makes the better switches worth it.

All of my VLaning is done using my ASA5505 at home, which then passes it via Trunks (VTP, VLan trunking protocol, 802.1Q) which lets the switch then pass for VLan's.

I would definitly suggest going with Port based VLan's as well, as with packet tagging, you still have to stop and wait to inspect every packet given.

My current setup:
Cable Modem -> ASA 5505 -> VLan 1-3 to Dell PowerConnect (8 port) -> Dlink 24 port unmanaged.

This steps from Layer 3, down to Layer 2, finally ending at Layer 1. The layer2 switch handles most requests, which lowers the demand on the router/filrewall, which ups overall throughput a bit more.

I keep my home segmented into three VLans, one being my main network, two being DMZ, three being wireless. (1=10.150.1.0 2=10.150.2.0 and so on). I use my network for testing major apps, as such, it's wayy overkill for the average user.

With my current setup, I am only limited by my server disk speeds. Most of my servers run on 10k SCSI drive, my main file server runs on 3Gb SATA.

My best throughput results sit at around 800/mbs Going from one DL580, to another, across the PowerConnect. This is just shy of what I get using a Cat-5e strait cable from NIC to NIC. The NIC's being used are HP Dual Port gigabits (In failover, not load balancing), in PCI-X slots. The OS in use is Windows 2008 Server, Standard.

As I said, this is an overly, overly complicated setup, and the results still aren't the best in the world, so it's all up to what you're willing to do.
 
Hey Tim, good site, thank you! I have access to a multitude of test sets, that normal home users don't. To validate switches, I've used a number of platforms, IXIA, Spirent, etc., that have a built in suite of tests for RFC2544 and other performance benchmarks.

I've also used netperf, ttcp, etc., with varying results. Unfortunately, these test suites, while theoretically competent, introduce a lot of variables outside of the device under test, TCPIP stack, drivers, etc., that I mentioned previously. In my experience, the only way to get consistent test results (comparing switch A to switch B) using these suites, is to "lock down" your test platform. Meaning, follow a specific series of events before and after running the tests, i.e. no software or hardware changes or reconfigurations, power off/on computer immediately before testing, etc. The results may not be 100% accurate, but your testing methodology is, allowing a good comparison A to B. To benchmark a piece of network equipment in this scenario, the only realistically viable test is direct memory to memory transfers, to reduce the number of test platform variables.

As for performance and features in a L2 switch, generally, the more feature rich, number of egress queues, priority maps, VLANs, 802.1x, LinkAgg, IGMP, LLDP, SNMP management, etc., the more functional and performance oriented the chipset and the more time and effort the manufacturer has invested in the making the firmware perform properly and more efficiently.

The difference between strictly consumer based versus Enterprise based products is driven strictly by cost. What kind of product does the consumer want, and how much is he willing to pay? These are the primary design goals. From there, the consumer oriented manufacturer reverse engineers the solution, with a chipset manufacturer consumer oriented reference design (with the appropriate cost saving shortcuts). No thought is given to performance or efficiency. The firmware is again, by and large, the reference design from the chipset manufacturer with an OEM label and maybe a Web based GUI wrapped around it. That's why, typically, you see similar performance/functionality from a multitude of vendors using the same or similar consumer oriented chipsets.

Even in the Enterprise world, where cost is a design factor, the hardware/chipset used is sometimes a common denominator. However, in professional products, more effort, time and expense goes into the silicon, PCB layout, firmware, and the basic Enterprise oriented chipsets themselves are augmented with other hardware. Then, the firmware is honed over time to improve it's efficiencies. Consumer firmware is rarely updated, and if it is, typically it's for level 1 bug fixes - something is broke.

The good news I've found, in the midst of all the "me-too" consumer oriented switches, is they all perform pretty well. Some manufacturers still try to eek out the last penny of profitability by saving the 5 cents of bottom line cost by building around 10/100 chipsets and PHYs. Every silicon vendor has moved to 10/100/1000 as their dominate product offering in the Enterprise, with the exception of their consumer oriented products which are largely 10/100. Again, to the average consumer, a 5 port switch on the shelf at BestBuy that costs $25 is just as good as the $135 switch, and therefore the market demands drive the supply chain. Consumers are their own worst enemies in this respect.... they don't demand, therefore the market doesn't need to respond.
 
Glad you like the site, Steve.

Sounds like you definitely have the chops (and test equipment) to give switches a good workout! :)

Looks like we agree on the last point. For the average consumer, a $25 switch is as good as a more expensive one.
 
PCI Nic is bad?

Forgive me for my ignorance, but if using a PCI NIC card for a gigabit throughput is bad, then what else is there to use?
 
Forgive me for my ignorance, but if using a PCI NIC card for a gigabit throughput is bad, then what else is there to use?

Many current motherboards have a higher bandwidth slot called PCI Express (often abbreviated PCI-E or PCIe). If your motherboard has one, you can try a PCIe gigabit NIC.
 
Gotcha. That makes sense. My mobo has one PCI-E, however, it's got my video card sitting in it for dual monitors.

From how the statement was framed, it seemed like there was some other non-mobo attachment out there I wasn't aware of.
 
Dont forget BDP

If the data is being transferred over TCP/IP then speeds can be compromised by Bandwidth Delay Product especially if tiny RWIN is used.

Some OS have receive windows as small as 8Kb which would reduce throughput below 1Gbit if latency exceeded ~70usecs.
 
Forgive me for my ignorance, but if using a PCI NIC card for a gigabit throughput is bad, then what else is there to use?

Clarification: Using a PCI gigabit NIC is not "bad" per se -- in any case, it can be a lot better than 100 Mb/s. PCI NICs are also largely indistinguishable in terms of performance from non-PCI NICs in probably the majority of real-world uses to date.

Issues arise if (1) you have other significant components on the PCI bus (e.g. a storage controller -- using both a PCI ATA/RAID controller and a PCI gigabit NIC is bad design, because they limit each other's bandwidth), or (2) you're trying to hit the high end of gigabit performance.

This thread started off talking about case (2), and it is in this context that PCI NICs should generally be avoided. The truth is more complicated and there are exceptions, and in many cases, the file transfer bottleneck is in the OS or file transfer protocol or the drive subsystem or something else, so hitting the high end of gigabit performance in network benchmarks is not as important as it might appear. It is an understandable but novice mistake in this area to think that it must be the network when you don't get fast file transfers over gigabit.

Of course, if you have a choice, it's better to get a non-PCI NIC, and this choice is mostly done these days when buying a new motherboard--ensuring that you get one for which the motherboard designers haven't simply bridged the on-board gigabit via PCI (which many have done to date, but is becoming less common).
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top