What's new

Viability of 10Gbe SMB Direct (RDMA) for Windows Server NAS to workstations in 2017?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Cyanara

New Around Here
Hi,

Got a bit of an advanced question here, but I found another thread on this site that makes me hopeful I can learn some useful stuff.

Background:
Our video production business is growing rapidly and in desperate need of a NAS upgrade. In addition to running out of disk space, 1Gbps (albeit bonded) wasn't enough when the business started, and it's just ridiculously slow now where 1TB projects are concerned. Fortunately prices for 10Gbe are finally becoming reasonable lately thanks to Aquantia and Netgear.

I also recently set up a Windows 2016 server as a test, and it's proving to have a range of benefits, so it occurred to me that rather than buy another proprietary NAS, I could just put the drives in the server and use Storage Spaces to set up a high performance RAID (keeping the old one as the nightly backup). The benefits of doing so (cheap, custom and replaceable hardware, no-fuss user authentication) seem great.

Now, we use Premiere Pro mostly, and while in the past it was terrible at operating over a network, Adobe have since improved that. However, as SSDs greatly improved in size and pricing it made much more sense to just operate off local copies of the projects on a high performance drive and upload them to the NAS when done (or sync them in the background).

The Problem:
With 4K footage now being common, we now face the issue of SSDs once again being very pricey to have the capacity we require for local editing (2TB), especially in m.2 format. While merely upgrading to a 10Gbe environment will no doubt make sourcing files directly from the NAS much more viable alone, I recently discovered RDMA and SMB Direct with their substantially reduced overheads, and am very interested to see whether they might offer greatly increased efficiency in our workflow.

Unfortunately I'm having trouble finding much information on the viability of such a setup in a small business environment. Most documentation seems to refer to only using it between servers. Does this mean that I'm misunderstanding the technology? Can it even co-exist with a normal LAN environment?

Can anyone offer advice or experience for this sort of scenario?

Thanks,
Joe
 
Last edited:
I use RAID 5 with multiple disks and SFP+ with SFP+ direct. Its cheaper to implement, and with 4k video it is very dependent on bandwidth too.
For example a lot of 4k videos are encoded at 20Mb/s which is no problem for even an old laptop hard drive to keep up. For 100Mb/s 4k footage, an old laptop drive can also keep up easily too.

SFP+ direct has DMA and other goodies offered by infiniband without paying more, it has the limitation of distance though so SFP+ direct 1m cables tend to be cheap, same with 2nd hand SFP+ cards and many switches offer SFP+ with gigabit ethernet or in some cases multiple SFP+ only ports for much cheaper than an equivalent 10Gbe switch.
For example mikrotik's latest switch with 16 SFP+ ports costs around $300-$400 for a fully managed layer 3 switch. Ubiquiti's offering is 12 SFP+ ports with 4 Gbe ports for a layer 3 switch at $400-$500. A 16 port 10Gbe switch even semi managed and layer 2 even from dlink costs more than this.

A single 3.5 inch drive will do 100-200MB/s nowadays as long as you dont fill it up too much (leave more than 15% space). You can also use the partition method for consistent speeds (10-20% partition at start will ensure consistent speeds). So each drive you add to your array increases the speed even more.

FTP actually has the lowest overhead of the bunch.
 
I think I get what you're saying. However, the workflow of a Premiere Pro project benefits greatly from increased drive performance, as opposed to the simple streaming of video I think you're referring to.

That said, you raise a fair point about whether bandwidth would be the limiting factor. I'm so used to it slowing down the copying of source files to the local computer that I haven't checked what kind of bandwidth it uses when directly editing from the RAID.

While Premiere Pro does require pulling multiple files at once, that's probably more of a reason to stick with SSDs (albeit SATA3) which can handle those simultaneous reads and writes with ease.
 
you get less overhead with SFP+ but many 10Gb capable boards or NAS come with 10Gbe instead but there are 10Gbe SFP+ modules. In your case an SSD would help as it could be pulling many small files instead and an SSD is much faster for this but hard drives will work fine for pulling large files.
 
Looks like RDMA support for Windows 10 is/will be a thing, but will require upgrading to Windows 10 Pro for Workstations: https://www.itnews.com.au/news/microsoft-releases-high-end-version-of-windows-10-pro-470632

So that's promising, but also a whole lot of extra expense on top of the specific premium hardware I'd have to buy. I think I'll be sticking with cheaper hardware and try an SSD for hot storage on top of the RAID. Worst case scenario, they'll still be able to move a 1TB project around in under half an hour.
 
For starters, RDMA is a function of the NIC, not the switch. RDMA can even work over WAN connections. And, honestly, I run RDMA in an S2D cluster, but when it comes to our file servers, despite having capable NICs I skipped it. RDMA does two things: reduces CPU overhead for SMB3, and reduces server latency when using cluster shared volumes. When dealing with video, neither is particularly important, and it's cheaper to just up the CPU you have. Throughput is identical.

One interesting thing about SMB3 is that you can also do other things to increase throughput beyond even 10 Gbit Ethernet - for instance, you can just hook up a ton of NICs. The performance of ten 1 Gbps NICs with independent IPs will be identical (more or less) to a single 10 Gbps link. It scales linearly, so you can just add a two port 10 GbE NIC or similar and just keep scaling so long as you have ports, even with speed mixing and matching.

As for SFP+ versus CAT6 for 10 GbE, throughput is identical given otherwise identical switches and NICs, but latency is not - SFP+ is noticeably faster. The same is true on regular 1 GbE by the way, SFP ports have much lower latency, and you can use SFP+ for the server while continuing to use copper for the clients, if you like.

Another thing you can do that's a little less intuitive is run a 10 Gbps storage network, which is a non-routable network without a gateway, on stand-alone hardware, and then use a secondary 1 Gbps network for all of your more normal traffic. This keeps the port buffers free to boost the performance somewhat, and has the large advantage of usually being free, since you likely already have 1 Gbps switches. This gives you effectively an SMB3 SAN. And finally, use tiered storage spaces and use SSDs to give your performance a boost, and use ReFS over NTFS.

Oh, and one other thing: if you're looking for switching hardware for this stuff, pick up the Netgear M4300's. They're faster for SMB3 storage than any other SMB level switch, by not a small margin. The only thing we tested that was faster is a Dell iSCSI switch, with the Netgear even outperforming a Cisco chassis for this application.
 

Thanks for that. That's some pretty nifty info. I definitely intend to use SSDs for hot storage on the NAS, and the non-routable network might be fun to experiment with.

I think my mindset in the original post was based on earlier version of Premiere Pro, where even people on 10Gbe networks were complaining about editing responsiveness over the network being unusable. RDMA probably would have been ideal back then to trick Premiere Pro into thinking the NAS was local storage, but I'm under the impression that Adobe have since made their software much more network-friendly, especially if local SSDs are used for cache/scratch. So unless Cat6 introduces delays in the hundreds of milliseconds, it will likely serve us fine. In hindsight I was probably getting excited over a solution to a problem that no longer exists. Cool new tech does that to me :p
 
I am curious on what you have settled - looking to upgrade my home NAS/network as well and deciding between QNAP 531X and or just get the mellanix SPF+ into one of the spare desktops and go home grown.
Similarly need to pick up a new switch and looking at Mikrotik new model vs Unifi Spf+ switch
 
I am curious on what you have settled - looking to upgrade my home NAS/network as well and deciding between QNAP 531X and or just get the mellanix SPF+ into one of the spare desktops and go home grown.
Similarly need to pick up a new switch and looking at Mikrotik new model vs Unifi Spf+ switch

The build hasn't been greenlit yet, but having had several years experience with ReadyNAS I'm definitely going with a custom build (helped by the fact that I need to build a new domain controller server anyway). While ReadyNAS probably has the best feature set of the turnkey NAS systems, we've simply had way too many problems that I couldn't fix because none of the hardware/software was replaceable.

Affordable raw throughput is most likely going to be the most sensible option for us, rather than throw money at niche high end features. So while I haven't yet had the chance to try any the hardware, and therefore can't personally recommend any of it, I expect to get several Asus Xg-C100c 10Gbps NICs and a Netgear XS708E 8-port Prosafe Plus 10GE switch with 1x SFP+ combo port. They seem to be the most cost-effective 10Gbe solutions available, as far as I can tell.
 
Rule #1 of building your own storage server: Don't put a domain controller on it.
Rule #2 of building your own storage server: DON'T PUT A DOMAIN CONTROLLER ON IT!!!

Being a domain controller disables write caching. Needless to say, that's abysmal for performance for a storage server. Aside, if you're running Windows Server on the box, it's more properly a storage server, not a DIY NAS.

If you're doing the domain controller rebuild anyway, you don't need to spend more than $500 on a box for one for a normal sized network. And one cool thing about doing a proper storage server where you have two clustered and both have two paths to each disk drive (the Lenovo SA120 is a fabulous JBOD array that can be connected with two SAS cables to each of two servers for full high availability) you can actually use it like your own private SAN. The two servers connected directly to the storage just handle storage itself, and you can then add other servers with just enough storage to boot the OS, and make a Hyper-V cluster using even the free Hyper-V type 1 hypervisor - and mount the .vhd/.vhdx/.vhds over the 10 Gbit SMB3 link ... or even just throw a quad port gigabit NIC into older servers and use that for your storage connection. And that is where the RDMA matters - hosting virtual servers and using SMB3 as a SAN.

For 10 GbE I'd move up to the Netgear M4100 if you can swing it though, it's very fast for L2 SMB3 traffic just out of the box, second only to Cisco 4500X and Dell's dedicated iSCSI switches, and you can even fit a pair of them in 1U if you like redundancy. Routing isn't great, but hopefully you won't be routing on storage switches. :)
 
Rule #1 of building your own storage server: Don't put a domain controller on it.
Rule #2 of building your own storage server: DON'T PUT A DOMAIN CONTROLLER ON IT!!!

Being a domain controller disables write caching. Needless to say, that's abysmal for performance for a storage server. Aside, if you're running Windows Server on the box, it's more properly a storage server, not a DIY NAS.

If you're doing the domain controller rebuild anyway, you don't need to spend more than $500 on a box for one for a normal sized network. And one cool thing about doing a proper storage server where you have two clustered and both have two paths to each disk drive (the Lenovo SA120 is a fabulous JBOD array that can be connected with two SAS cables to each of two servers for full high availability) you can actually use it like your own private SAN. The two servers connected directly to the storage just handle storage itself, and you can then add other servers with just enough storage to boot the OS, and make a Hyper-V cluster using even the free Hyper-V type 1 hypervisor - and mount the .vhd/.vhdx/.vhds over the 10 Gbit SMB3 link ... or even just throw a quad port gigabit NIC into older servers and use that for your storage connection. And that is where the RDMA matters - hosting virtual servers and using SMB3 as a SAN.

For 10 GbE I'd move up to the Netgear M4100 if you can swing it though, it's very fast for L2 SMB3 traffic just out of the box, second only to Cisco 4500X and Dell's dedicated iSCSI switches, and you can even fit a pair of them in 1U if you like redundancy. Routing isn't great, but hopefully you won't be routing on storage switches. :)

Alrighty, that's definitely a curveball. Thanks for pointing that out. That said, after doing a lot of Googling (and finding entire threads where no one knew about the write caching policy) I've been having trouble working out whether the enforced write caching policy is OS-wide, or specific to C: (where AD should be running from). I'd especially expect Storage Spaces to be somewhat intelligent on the matter. Since computer hardware often costs about double in Australia, office space issues, and the fact that Windows Server licenses aren't cheap either, I think it still falls upon me to do some experimenting with my current test server and find out the most cost-effective solution. Either way, it won't really affect initial hardware purchases (when they happen) thanks to the flexible home grown approach.

On a side note: the Netgear M4100 appears to be 1Gbe only. I'm guessing you meant the M4300 (XSM4316S)? It costs about 4 times the XS708E however, so it's not likely to happen at this stage in time. Even doubling our transfer speeds would be a big win right now (and I'd rather not mess around with bonding).

Cheers
 
For a small office, you can run a DC on a NUC with an Atom CPU - just make sure you have two of them for redundancy.

And don't worry about LACP or anything, just plug the NICs into ports and have some sort of IP addressing running, SMB3 sorts it all out for you. :)
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top