What's new

Confessions of a 10GbE Network Newbie

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Thanks T. The last part of the series is an affordable 10GbE workstation and a $2500, 1000MB/s, 20TB server build...my favorite part of the series. Leaving the best for last :)

Cheers,
Dennis.
 
I am not sure that it's entirely true that you must have as many logical cores as you need SMB threads.

Looking at the article you linked to, which I've seen before, it looks like what will happen is that each adapter will spawn as many RSS queues as there are logical cores, times the network adapters (I don't know how multiport adapters work in this respect at all).

SMB multichannel will work without RSS or RDMA, its just likely to be slower, especially on the 10GbE side of things where you'll possibly hammer one CPU core under with the load, with no RSS.

Anyway, so if you have a 2 logical/physical core processor, each network adapter can spawn 2 RSS queues/SMB threads, one per core. You could have 4 adapters though each spawning 2 queues/threads and also still get all of the lovely throughput of having 4 adapters running.

The limitation will be CPU processing capability as well as small file performance. The more SMB threads there are, the better the small file SMB transfer performance will be. IIRC with SMB, what actually happens is it is spawning an instance for every file that goes through the network interface, which requires a certain amount of CPU and network pipe overhead for everytime it has to close a thread and open a thread.

So there is less overhead with big files than with small files. With SMB multichannel and RSS, it allows multiple threads to be run at once, so there is less overhead on small files if the OS doesn't have to close a thread before it can open the next thread and start transfering the next file, as it can have multiple in flight at once. It also means that all of the network processing isn't on a single CPU core, so if whatever is going on is enough to max the core, you aren't CPU bottlenecked in networking performance, you can spread it across all of the cores.

It does mean having more cores is likely to improve small file performance over fewer cores, as you have more SMB threads running at once. It also means that you are less likely to be CPU bottlenecked on something like a 10GbE adapter (I don't know how likely that would be without RSS on a modern processor, or if that is really just more of a concern with older processors).

I don't think it'll actually prevent you from running more SMB threads and thus more throuput, than you have network ports. It just won't be able to spawn multiple threads per port and it won't be able to spread the CPU processing load across multiple cores.

Neither of those might be issues with something like GbE networking. I haven't actually tested it out yet, but I do plan to sometime in the near-ish future. My server is just a Celeron G1610 with 2 cores and I currently have 2 Intel Gigabit CT adapters in it doing network duties over to a similar network setup on my desktop (though my desktop is an i5-3570 with 4 cores). I intend to re-enable the onboard NICs and check out what kind of performance I can get SSD to SSD on the machines. That should easily tell me if it is running more than 2 threads/leveraging all three NICs.

Normal use case I don't have a reason to run 3 NICs though as my RAID0 HDD array is limited to about 220MB/sec (currently. I am at 65% utilization on my desktop RAID0 array and they are older 1TB disks). Once I upgrade the RAID arrays in the machines I might think long and hard about it, but it means running another wire between the machines...which means opening the walls again (I am going going to string a temp cat5e for testing 3 NICs). I may move my office in the future to elsewhere in my home where I can easily restring wiring through the crawl space there...so I might change up to a couple of dual port cards or a quad port card at some future date. No year soon though (and supposing 10GbE hasn't gotten affordable enough in 2-4yrs to give it a shot).
 
The Intel documentation on 10G tuning actually addresses the RSS/vs core and tuning spec for the drivers. I did see an improvement in throughput from an i5, to an i7 ..and each time set the RSS queue to match core count. On Windows 2012 server version of the driver they add a bunch of Numa options so you can actually tune this where dual processor machines are involved and so on. This is why the 2012 Server (narrow ILM heat sink just showed up today!) I am building has a 6 core Xeon..which will present 12 cores logical to the OS. Intel suggests that you set RSS queues to match your logical core count. If you play with RSS settings, while observing CPU loading per core, you'll see it. A good read here:

http://blogs.technet.com/b/josebda/...ature-of-windows-server-2012-and-smb-3-0.aspx
3.1. Single RSS-capable NIC

This typical configuration involves an SMB client and SMB Server configured with a single 10GbE NIC. Without SMB multichannel, if there is only one SMB session established, SMB uses a single TCP/IP connection, which naturally gets affinitized with a single CPU core. If lots of small IOs are performed, it’s possible for that core to become a performance bottleneck.

Most NICs today offer a capability called Receive Side Scaling (RSS), which allows multiple connections to be spread across multiple CPU cores automatically. However, when using a single connection, RSS cannot help.

With SMB Multichannel, if the NIC is RSS-capable, SMB will create multiple TCP/IP connections for that single session, avoiding a potential bottleneck on a single CPU core when lots of small IOs are required.


3.2. Multiple NICs

When using multiple NICs without SMB multichannel, if there is only one SMB session established, SMB creates a single TCP/IP connection using only one of the many NICs available. In this case, not only it’s not possible to aggregate the bandwidth of the multiple NICs (achieve 2Gbps when using two 1GbE NICs, for instance), but there is a potential for failure if the specific NIC chosen is somehow disconnected or disabled.

With Multichannel, SMB will create multiple TCP/IP connections for that single session (at least one per interface or more if they are RSS-capable). This allows SMB to use the combined NIC bandwidth available and makes it possible for the SMB client to continue to work uninterrupted if a NIC fails.
 
Last edited:
It looks like there are tunable defaults for smb3 channels, with a limit of 4 tcp/ip per interface (nic) unless you change these values. RSS on the other hand looks to spread this workload over all available cores. In the 2012 server version of the driver you can (and perhaps should) do a bit more tuning as for example all non rss traffic defaults to core 0. So the excerpt below suggests setting RSS to start with the 2nd logical core.

http://books.google.ca/books?id=4fL...=y#v=onepage&q=processor cores vs rss&f=false

The RSS tuning tips I described really require some more research on my part as NUMA and physical ram arrangements are of little concern in a single processor system... so RSS tuning may have less/no effect with my current single CPU system. Running nicely as of today...

As far as multichannel sessions, I've updated the blog post for clarity, and inserted some info here for reference:

Quoted: http://blogs.technet.com/b/josebda/...ature-of-windows-server-2012-and-smb-3-0.aspx

6. Number of SMB Connections per Interface

SMB Multichannel will use a different number of connections depending on the type of interface:

For RSS-capable interfaces, 4 TCP/IP connections per interface are used
For RDMA-capable interfaces, 2 RDMA connections per interface are used
For all other interfaces, 1 TCP/IP connection per interface is used

There is also a limit of 8 connections total per client/server pair which will limit the number connections per interface.

For instance, if you have 3 RSS-capable interfaces, you will end up with 3 connections on the first, 3 connections on the second and 2 connections on the third interface.

We recommend that you keep the default settings for SMB Multichannel. However, those parameters can be adjusted.



6.1. Total Connections per client/server pair

You can configure the maximum total number of connections per client/server pair using:

Set-SmbClientConfiguration –MaximumConnectionCountPerServer <n>



6.2. Connections per RSS-capable NIC

You can configure the number SMB Multichannel connections per RSS-capable network interface using the PowerShell cmdlet:

Set-SmbClientConfiguration -ConnectionCountPerRssNetworkInterface <n>
 
Last edited:
Some sneak peaks of the Windows 2012 "NAS" build. This board may be the best buy right now in terms of an ATX format board with 10GbE as it incorporates 2 x 10GbE ports on board, provides 10 x SATA3 ports, and 4 SATA2. I'm using an inexpensive (but fast!) RocketRaid 2720 to connect the 6 x 4TB drives in RAID5. You're looking at about $2500 including the 7200 RPM 4TB Hitachi drives, 16GB ECC registered RAM, and a six core Zeon E5-2620v2 processor. The processor has max TDP at 80 watts, six cores and 2.1 to 2.6 Ghz speeds. There is ~ 20TB usable in RAID5, with reads in the 900MB/s range, and writes in the 450MB/s range. A nice balance of efficiency and speed.

The last screen grab is from the onboard out of band management. This board, like many server boards, has a third LAN port that when connected, provides a virtual KVM. You can boot the computer, view bios screens, update bios and monitor system parameters just as if you were sitting in the front of the computer. Hardware alerts (like a failed system fan, or temp range exceeded) can be emailed from that system, even if the OS is not booted.

sth1.jpg


sth2.jpg


ipmi.png
 

Attachments

  • sth1.jpg
    sth1.jpg
    73.8 KB · Views: 508
  • sth2.jpg
    sth2.jpg
    91.5 KB · Views: 733
  • ipmi.jpg
    ipmi.jpg
    39.5 KB · Views: 529
Last edited:
As a home (novice) videographer with an older processor and qnap (659p+) and a certain impatience with the pace of advance in recording technologies vs. the pace of the tech I can afford to edit it with, I'm reading this series with not just a little interest! I look forward to the next in the series for sure. Apparently I will have to learn to love Windows 8, but a bigger pipe between storage and editing machine is desperately needed. Thanks for doing this!
 
IIRC the latest version of Samba full supports SMB Multichannel. It does NOT support all of SMB3.0 and none of SMB3.01 to be clear, but I am pretty sure it supports SMB Multichannel. So Linux could still be an option for you.

Or, learn to love windows 8/8.1.

Just think of it as "Dr. Strangelove: Or how I learned to stop worrying and love NT6.2/6.3" :rolleyes:
 
Samba's SMB3 does not support simultaneous multichannel like Microsoft's OS does. Even with 4 SSD drives in RAID zero in the TS-870 with 10GbE, the max I measured was still 300MB/s less than a similar drive configuration on Windows 8.1/Server 2012.

http://samba.2283325.n4.nabble.com/SMB3-0-alternate-channel-td4644134.html

There is an "alternate" channel iteration, but it is not the same as Microsoft's SMB3 simultaneous multichannel. From my online research, it looks like multichannel is not even in the pipe yet for SAMBA :-(

This difference is about 99% of the reason I've built up a 2012 Server and have updated all of our 10GbE workstations to Windows 8.1.
 
Dang, I had thought it was a supported feature in the latest version of Samba.

I guess just another reason for me to keep my server on 8 rather than move it to Ubuntu.
 
I suspect there will be a push in Samba to do multichannel, however my guess is that this is very much an OS thing as every SMB transaction starts off as a single channel, and is then negotiated to increase efficiency. You really see this in the Windows 8.1 graphical file copy GUI during a transfer.

From this blog: http://blog.fosketts.net/2012/12/12/samba-40-released-active-directory-smb-support/

Right now, Samba 4.0 negotiates to SMB 3.0 by default and supports the mandatory features, including the new authentication/encryption process. Over time, the Samba team will add additional features, with SMB Direct (SMB over RDMA like InfiniBand) already in testing.

The foundation for directory leases is there, but advanced and specialized features like multi-channel, transparent failover, and VSS support might not ever come. Since Samba has long supported their own type of clustered file servers, Windows scale-out file server support does not exist in Samba at this point.
 
Quick question, with windows 8.1 pro on two workstations, do you still get full speed on 10Gb Ethernet? In other words, does the server have to be a server, or can it just run plain Jane Windows 8?
 
I would guess it can be plain Jane i7 4C/8T cpu with a ton of RAM and Windows 8.1 Pro.
 
An excellent question. I address these questions specifically in part 6 of the series..still working on it. At no point in my testing (other than with BitDefender running) did I see processor loading at 100%, even with an older i5 processor. Knowing that SMB3 will set up a maximum of four threads per NIC, for each connection, it makes sense to consider an i7 with four physical cores, and set up RSS queues at 8 in the NIC driver setttings to match the 8 cores the OS will see (if hyper-threading is enabled in bios) I was able to consistently transfer files in both directions with speeds over 1GB/s if both machines were i7 processors. If one workstation was an i5, and the other i7, the tests showed asymmetric performance with i7 faster. RAM is a non-issue really...but I tested with 32GB in each machine so I could setup up large RAM drives for large file transfer tests. On the Windows 2012 R2 server, if you have write caching enabled on your drive arrays, more RAM does help as the server will use this as very fast cache. For workstation backups etc, the cache fills up quickly, and then you're limited by hard drive speeds

Two Windows 8.1 (I am using 64bit, Pro) workstations returned 1.48GB/s during RAM drive file transfers with two 10GbE ports connected in an SMB3 multichannel test. In other words, a NAS based on Windows 8.1, a fourth gen i7 processor, a $150 Z87 or Z77 chipset motherboard (I tested the Asus Z87-A, and Z77-V) will saturate a 10GbE connection, and indeed take you well beyond that if you run multiple connections.

The i7 3770K slightly overclocked to 4.3 Ghz on an Asus Z87-WS motherboard, with 32GB Gskill 1600Ghz RAM, is the fastest PC in my tests. That said, the 2.1Ghz (2.6 Turbo) E5-2620 v2 Xeon chip in the Supermicro (six physical cores) in my test server is idling along at 25% or less during 1GB/s file transfers. Much of this is due to the RocketRaid 2720 RAID controller which relies on system CPU. There is a misconception in the Adobe CC video editing world that a dual CPU Xeon system with a ton of RAM will outperform a single i7 4770K processor system. The Adobe PPBM tests I am using show that the i7 systems are faster, and often for 1/3 of the price. I guess the moral of the story is to do a bit of research before throwing money at a workstation :)

The only significant downside I could see using Windows 8.1 over Server 2012 (with regard to 10GbE network speed) is the limit on SMB3 connections for Windows 8.1. This is set at 20 connections. Windows 8.1 also does not allow NIC teaming (a feature built in to Server 2012), however you don't really need this where SMB3 multichannel is concerned.
 
Last edited:
I've sent in part 6, but here's a screen grab or two. Both machines were running Windows 8.1 64bit. Both workstations were sharing RAM drives (using SoftPerfect Ramdisk) to remove disk IO as a limitation. With Intel 530 SSD drives in both, the max possible would be 450MB/s or so due to Sata3 limitations. If both were sharing newer PCIe SSD drives, then you would see the full bandwidth as below using RAM drives.

10g_ramdisk_windows81.png




10g_ramdrive_naspt.png
 

Attachments

  • 10g_ramdrive_naspt.png
    10g_ramdrive_naspt.png
    28.4 KB · Views: 743
Last edited:
I've sent in part 6...

In vain have I searched for part 5.... :(

I'm particularly interested in the cut-over as to when a NAS is not going to be the right solution for hosting video source. I like the low-power idle of my qnap, as I'm mostly not using it, but if the lack of multichannel screws my ability to stream a few xavc-s streams, that might be a showstopper. [I'm not likely to use xavc/frame-only formats, but, famous last words. And very occasionally I throw more than a hundred streams up on the screen, but in those cases, pre-rendering is *fine*.]
 
The cut over as I see it will be in the 750 MB/s range. If you need more bandwidth per connection, then Microsoft's SMB3 would be easier to work with, particularly if you are using two 10GbE connections simultaneously. I was able to generate 1.48 GB/s using dual 10GbE connections..and the limit I suspect was the ram disk software I was using.

Microsoft's SMB3 has quite a few powershell commands as well which would allow you to tune for specific applications.

Looking at cost, my server build with 24TB of storage using the server board I chose, is about $3500, not factoring in Server 2012 costs. The disk array in RAID5 is good for 900MB/s reads. A NAS like the TS-870 Pro with 10GbE and the same 24TB, is about the same price, but you get system support and warranty, and there are no software licensing costs. With eight disks in the NAS, I was getting ~ 750MB/s reads on single file transfers. With either system, unless you are using a much larger disk array (or SSD drives only) the moment multiple users start loading the NAS, it does not scale as you might think. I'd assume this is due to the limitations of the hard disks to service separate disk queues. As SSD drive costs drop and sizes increase, it will be interesting to load up a NAS with these.

I'll have to run a few power tests to see how both stack up. Boards like the Supermicro A1SAM-2750F (based on 8 core Avoton chip, 20W TDP) would be an excellent start for a power efficient NAS.
 
Last edited:
The last of the 10GbE articles is up. The project as far as Cinevate is concerned is done. We're using the 2012 Server build as detailed in part 6, and the Qnap 870 Pro with 10GbE interface, along with the Netgear 8 port 10GbE switch.

As far as CAT6 being required for 10GbE, we're using precisely zero CAT6 cable at Cinevate. So far, the cable runs likely don't exceed 50ft (in ceiling and walls so likely a bit further) but it's all CAT5e, and everything is connected and running at 10GbE speeds. Because CAT5e was working so well, I didn't even bother replacing our patch cables with Cat6a. So some good news if you're implementing 10GbE over shorter runs.

Cheers,
Dennis.
 
Dennis Wood, a sincere thank you to you and to Tim Higgins for allowing you to post your 10GbE articles.

I am sure many current and future users will be using your templates and findings when building their own 10GbE networks.

Take care.
 
You're quite welcome. I seriously wasn't expecting the blog series to be so extensive, however the process required a surprising breadth of coverage. I do hope folks find it useful...and are as considerate as you have been with their gratitude :)

Cheers,
Dennis.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top