What's new

What do I need to learn? (need to design a SAN/NAS up to 300TB))

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Twice_Shy

Occasional Visitor
Hello, i'm totally new here and if my name gives you any clue (once burned, twice shy) i'm somebody who learned the HARD WAY. Now i'm trying to learn how to do things the right way.

To keep this from getting too huge (i'll happily fill anyone in on the boring details if you ask) i'll just jump right in though.

Quick facts:

- I am trying to learn what is necessary to build a series (at least two steps anyways) of NAS/SAN type devices to start at the small end of a shoestring budget (I am a back to college student deep in debt and I expect things to be tight for years) which will do the job until I have the budget and can hire some professional to take care of it

- The purpose of this is Film Projects and Video Game Design stuff. As you may well be aware, data has grown to ginormous proportions. Star Wars 7 for instance has over 1 PETABYTE of work data for it alone. 300TB is a figure I pulled out of my rump but it's not unrealistic, certainly not for 'later' when shooting 4k/8k as opposed to 'now'/that level is not going to be needed for a few years. This is more about growing into it. It may even have to get larger.

- The budget is not zero, but it is not "fully adequate". It's more like there are always higher priority claims on available income or new loans which preferably get spent on things like camera equipment or lens rental.

- The people working on this are close to "nonprofit" status. What I mean by that is everyone is volunteering time in a creative collective and we will be trying to pull each other up by each others bootstraps instead of ourselves. Create youtube shorts, indie games, and similar until something takes off. Once it does, try to kickstart everyone involved in the collective into productive taxpaying members of society no longer living in their parents basements and eating Ramen. I guess this is worth explaining because i'm sure it will be brought up how underfunded everything seems - it's just how it is. The point is that we can't afford to hire proper IT professionals when we cant even afford proper equipment and location shooting permits either.

- Therefore, I have to learn how to do this myself. If I cant learn, it doesn't get done. After five years of waiting for someone else to volunteer to be the IT Professional, it Just Hasn't Happened. So i'm taking the initiative a second time and trying to brush off the dust, treat the bruises, and try again. Nobody else in our group has had any kind of breakthrough of cash input or meeting an IT guy who wants to put in this kind of time either I mean, it's just artists struggling to help one another and right now lacking proper data storage is holding everyone and everything back. Yet I seem to have the best (still inadequate) computer knowledge for setting this up so i'm volunteering.

- I already tried to do this and it's where my name came from. I've got a box of like two dozen hard drives either half dead or in the process of dying of lost data we hope to recover someday from the last attempt where the best I knew how to do ended up in a hopeless flustercluck.

- As soon as the money and income becomes available to migrate everything into "everything done right" that will be done immediately. There just has to be data around to migrate/it has to get that far. The problem is that may well take 10-15 years and I don't want to repeat the optimistic judgements that led to total catastrophic data loss last time.

- Alot of this is a research project that's going to be "mapping the future". Some implementations are going to be impossible on X budget - that's okay - but after figuring out clever ways to do more with less i'm hoping to know where the lowest possible entry point is so that i'm able to jump when it is feasible. In some cases the only answer will be "you wont be able to upgrade to a better system unless prices drop or more money comes in" and as long as I know i've lowered that figure as much as humanly possible, that's okay. It's more like if I haven't lowered that figure sometimes i'll literally be taking food out of someone else's mouth who sacrificed to donate money for hardware used by everyone else in the group, so please understand if I get a little religious about saving money. Saving $50 on a computer case even when $1000 for drives has to be spent is the difference of one of my actors eating well vs Mac'n'cheese all month again.

- I don't expect full answers even in a year, I expect this to be a slow learning progress that expands as I go. That's okay because everyone previously involved is stuck with their own fulltime jobs for awhile and it's going to take time to roll back to where we're ready to start filling drives with footage needing editing and such anyways.

Wow, that was longer than I thought, but hopefully it brings you up to speed.

At this point I still feel I am in the condition of "I don't know what I do not know" so i'm asking for suggestions on how to bring myself up to speed. This can be books, wikipedia links, or just terms, but please understand my goal is not to become a fulltime IT professional either. I'm trying to learn enough to do the job right, but the only job I have to do right is my own - I don't need to learn how to manage other people's servers, or familiarize myself with any old or cutting edge tech i'm not currently using. I'm not trying to replace a normal multiyear education in the computer field.

My sole goal is getting up a highly reliable and expandable storage system on the lowest reasonable budget we can struggle through, so that everyone can finally get back to work, but not have to worry about LOST work from cutting the kind of corner that should never be cut. There's a difference between a money saving strategy (not spending $400 on some nice 16 bay hot swap hard drive rack, and just stuffing 8 drives apiece into two $30 midsize NZXT towers, or even two $5 used thrift store mid towers) and something fundamentally undermining what you are doing. (stretching backup cycles to every two weeks instead of every day, buying "used" hard drives, using dodgy chinese PSU's)

Examples of things i've read about or been interested in while pursuing this project:
- ZFS as a file system (really seems like an ultimate, as silent data corruption turned out a BIG problem in the tens of TB I had stored that I only learned of later, the only problem is either nobody builds really big systems or they cost astronomical over some 'sweet spot')

- FreeNAS (or some other off the shelf "low maintenance" not too demanding NAS system)

- used FibreChannel HBA's ie esp the 4gig speed

- used Infiniband HBA's ie esp the 10gig speed

- last generation tape like LTO6, which is cheaper than even modern cheap hard drives, wont be murdered by a drop to the floor like a box of hard drives already was, easier to mail through the post and have it survive, and designed for 15-30 year archival lives since if this project hasn't turned profitable in 15 years i'm pretty sure it never will but i'd like to plan 15 years as a worst case retention time for data.

- SAN setups, especially as a way to possibly hook up a more modular form of storage block designed around 8-10 drive ATX cases and consumer motherboards instead of huge expensive RAID cards and mobos beyond the consumer max of available RAM to run properly (the way ZFS under FreeNAS needs to, which seems to have seriously bottlenecked larger systems so far since nobody builds them)

There are multiple types of server that would have to be involved and multiple boxes to be set up which would change over time. Ie a higher performance setup for film editing backed by a lower performance backup system only turned on once per day for daily backups which exports most data to LTO6 tape until we have the local HD space to have it in working memory. Separate expansions of LTO6, HD, and SSD capacity based on what is being done by whom and where the bottleneck is. (lots of video footage shooting might all go right to LTO6 until we have the money to upgrade to SSD's to better work on it for instance)

Whats most important is OVERALL STRATEGIES and ASKING THE RIGHT QUESTIONS which I don't even know yet, which is why I don't have that much in my head decided about what i'm going to do yet. As I said "dont yet know what I don't know" and open to ideas, suggestions, references, and sample builds others have done to get inspired or get ideas. I realize if I can better define the problem, people are better empowered to help me design a solution, but there are many concepts to me that are totally new ways of thinking before I read them. (ie trying to wrap my head around all the ZFS concepts for instance, I was used to just saving files to a fixed size drive, the ideas of storage pools and snapshots I could barely parse at first)

I'll stop talking so that some people can read and make a few starting comments before expanding everything else. This already is a pretty big post to read i'm aware. :^)
 
Building reliable storage systems that don't put your data at risk is a job for professionals. So the best advice I can give you is to not try to reinvent the wheel and look at commercial systems.

The good news is that SAN technology has moved out of the realm of enterprise servers to be a major area of focus for NAS makers.

So I'd evaluate QNAP and Synology for their offerings before trying to put together your own system.
 
Building reliable storage systems that don't put your data at risk is a job for professionals. So the best advice I can give you is to not try to reinvent the wheel and look at commercial systems.

Thanks for the advice, but please reread "Therefore, I have to learn how to do this myself. If I cant learn, it doesn't get done." :) Thats what I was told years ago, so nothing got done, and just saving data on normal drives just let to a total loss.

Nothing in the economics makes commercial SAN systems that i've looked into viable for what needs to be done if you read the rest. If I can't figure out how to do it, nothing happens, and that's about as bad as another catastrophic loss, which is just as bad as not even trying at all. Other people use things like freeNAS all the time, there's articles here on making your own fibrechannel SAN for $1500 or whatever - this isn't brain surgery. But it's probably as complicated as changing the clutch in your car. Which is a learnable skill. (I started with no mechanical experience and successfully did that too - 4 years and 40k miles later it still works fine)

I'd appreciate if "leave it to the professionals" posts would pass on further suggestions since they don't actually help me either do anything or afford anything. :-/ I don't want to sound ungrateful for any and all advice, but it's the only advice I can make absolutely no use out of.

"Therefore, I have to learn how to do this myself. If I cant learn, it doesn't get done."
 
I would probably say that you are in the wrong forum. You would probably get more traffic and better guidance on the FreeNas or OpenFiler forums. Not that there aren't smart people here, but the task you are describing is way beyond small network.

It's possible to learn how to do this yourself but when @thiggins talks about letting professionals do it, it is the best advice. You talk about changing the clutch yourself, which is fine, but would you want to be responsible for the clutch, the transmission, the steering, the brakes, the oil and everything else on a fleet of cars? Because when you talk about building something that can do 300TB, that is the task. Keeping that much online and reliable with low data loss is a 10-20 hour a week job and that is with enterprise hardware and service contracts to help you along.

The first thing you have to do is get an idea of where your start is going to be. You mentioned that you pulled 300TB out of nowhere. the type of equipment needed to put 20TB online and make it reliable with good backup is much different than getting 300TB online. Start small and then start adding improvements with best practices as your needs and skills evolve. Here are some of the questions you should ask:

- What is the amount of storage I need online?
- What is my anticipated data growth?
- How often does the storage need to be backed up? (ie, how many hours of work am I OK with losing)
- What is my backup window?
- How often do I need to restore from a backup?
- What equipment will I need for the management network?
- What equipment will I need for the storage network?
- How many people need to access the volumes?
- What is the sustained throughput I will need for my applications?
- What method will the systems access the storage?
 
Stop by forums forums.nas4free.org and have a look at NAS4Free.

But some groundwork needs to be established where will the equipment live? Who pays for equipment & Power?

Off lease data center equipment can be found on Craigslist or even ebay cheap but the power bill can be steep.

Maybe have a look at the datahoarders subreddit for ideas of possibilities and what complexity awaits you.

If you can find a sponsor for equipment space, tech equipment and / or funding that will be key.
 
I would probably say that you are in the wrong forum. You would probably get more traffic and better guidance on the FreeNas or OpenFiler forums. Not that there aren't smart people here, but the task you are describing is way beyond small network.

Thank you for asking good questions and trying to help me work through this. :)

Yes it's possible i'm in the wrong forum, and I will look into both those forums. If there are any others I need to look into other people can suggest this too. This is a SLOW project/it's not going to happen overnight. You might even see me wait 2 months before posting again at times, I just chip away with it when I can and it sits until I get back to it.

At the same time, i'm trying to for instance first ask the question of whether i'll be using either FreeNAS or OpenFiler. I'm still asking my most basic questions of "what are the OPTIONS even that I might even implement?" before deciding on a strategy. Or put another way, what buzzwords do I need to learn about how to implement vs those that are just marketing BS?

Yes I agree there is clearly ALOT of things to be learned to do anything right. Yet after I was forced to do the clutch on my car just reading a book (and being sure it would fail) I gained confidence, and I actually HAVE done the brakes, steering, and etc on multiple cars, and my first transmission pull is probably coming up this summer, so... ;)

"Whats needed to put 300tb online is different than 20tb" - yes I agree, but the reason for phrasing the question that way was so I do not paint myself into a corner with a solution that hits a wall. The "best" solution for 20tb may be different than the best for 60-120tb may be different for 300tb-to-10PB. It might be 50% harder to learn about the 'enterprise' solution but knowing the long term goals, worth it. Whereas it might also cost 500% more per gig for the 'enterprise' solution, making this a nonfinisher and having me just plan to build 32TB server cubes and put up ten of them as a workaround. Cases where a little more time gives me a more future-proof solution are things i'd like to learn... EXCEPT when the price skyrockets.

The two biggest drivers are minimal overhead cost (note I didn't say cheap, being maximally frugal doesn't mean cheap - like my example of using a couple used thrift store ATX cases to store 8 hard drives apiece instead of some $400 16 drive server case) per TB stored and flexibility. I don't want to have to change horses mid race if I can help it. This is basically a "bridge" system - starting when budgets are tiny, and needing to work until something has the capability of being self funding and I can hand the keys over to an IT pro, write the checks, and not have to stress anymore.

I have more time than money, but I also wish to minimize the total footprint of time. This means if something I learn now saves me time in the mid-term it's worth it. My example was not having to worry about viruses and lost data anymore. "Set it up once RIGHT and then freeze the drive image" so no matter what virus hits it, you just roll back. Set up a NAS with snapshots and a good backup so that if corrupt software mangles a file, it just rolls it back. I DO want to do things right, I DONT have the time to do things over from scratch, and I MOSTLY want to learn only what I need to do the job properly so I can get back to work. :)


So lets go through the questions one at a time...
- Rementioned, where am I starting vs where will I end up? I'm probably like to start with something like 20-32TB online since FreeNAS systems at that performance level are basically a fully known quantity. (assuming I use FreeNAS but i'm open to other options) Worst of the worst case I clone that cube and throw more of them at the problem. Most important is the data is reliably stored even if there's ten volumes to look through to find the right footage. $$ trumps max convenience.

- Exploring SAN options is for when the NAS becomes a performance bottleneck. I am undecided how large it has to be but will start smaller than the NAS. Maybe newer high performance NAS's make it pointless - I don't know yet. Maybe two terabytes of budget SSDs in a RAID when we can afford it? Might be 1tb might be 4tb depending on workloads so just a starting figure, bigger later when prices are down or possibly more people work on the problem.

- What is anticipated data growth? I'm not sure, but I know it will be progressive. If i'm starting at 32TB i'm pretty sure i'm not ending there. Nothing less than 100TB within 3-4 years from now if we start shooting Red Weapon 8k footage until the cows come home? Part of it is the data will come in opportunistic spurts and if we dont have the hard drive space we cant shoot the footage or we shoot lower quality footage. As to how we can afford to use that nice of a camera but not the hard drives - we cant. It's kind of grey use on the side from the schools we are going to. :p But each camera stores 300MB/second peak rate or 135gigabytes/hour of footage. Once you start running multiple processing passes and stages on footage the workfiles get big and 'best practices' is save everything so you don't have to do it over. Obviously best may have to be compromised.

- How often to back up? At least daily - losing a day's work sucks but at some point the cost of overprovisioning is a problem too. If something isnt processing overnight backups can run. Realtime backups/mirroring would be great later, depending on cost. Thats one thing that interests me in SAN which apparently is easier to do that with - just failover to another box. I would not expect the cost to justify it until later though but it would be nice to know I could. Two SAN boxes might also allow two users to load balance.

- What equipment for the management/storage I wasn't sure if I understood the question since I assume thats part of the next step. Other than i'd mostly prefer to use consumer grade hardware unless whats often a 10x price for server grade stuff really is 10x better. I don't need six nines uptime and availability here.

- How many people will need to access? At the beginning 2-3, but its possible long processing operations from separate computers could be up and running depending what it is. For the actual video work/most demanding use it could be up to six people at once but it's cheaper for us to stagger schedules most of the time if it's going to be serious hard drive thrashing loads. Probably try to limit it to maybe two or max three heavy loads at a time normally - alternately doing loads on separate servers (another reason for more than one SAN box) will be an option if we add boxes instead of expanding a monolithic system. Short version if consolidating storage and disks improves performance and lower costs we'll do it - if it increases costs we wont. Consolidating is mostly for convenience, less hassle for "updating" overheads, lost data, virus problems, and similar. I would LIKE to both save money and have it perform better of course! But it's about finding the few places it works best and doing the work that way - in the end depending on the work it might go to local DAS, SAN, or NAS. It can always fallback to DAS if network capacity is too pricey and intermittently used to justify it.

- Sustained throughput for the SAN is going to be "whatever the network can give me" from a single box because Adobe CC processing of huge video files is usually a hard drive bottleneck when they dont fit in RAM anymore. For the NAS I dont know yet - i'm wondering if virtualization (VMware, Virtualbox, or Multpoint Server) type uses can work fine over gigabit ethernet as totally diskless workstations. Or if it works fine with a local SSD caching drive (for changed data) to reduce network loads.

- What method will systems access storage? Wasnt sure I understood, unless you mean deciding when to use DAS vs NAS vs SAN. (and if there are any other three letter acronyms I should know :) )
 
Stop by forums forums.nas4free.org and have a look at NAS4Free..

I'll give there a checkout sooner or later yes, thank you, for now this is what hot rodders call "bench racing". :) Playing around with some big figures on paper before anything is decided or committed. Wants to think my way through the different options and be aware of all the big choices that might be made, and when to choose one way vs another.

Who pays for equipment and power? Either me or some people throw in for at least the extra drives and equipment but I cover the power. Equipment would not necessarily be on 24/7 unless it's processing a job. Or might have separate servers that are always on vs intermittently on vs task specific. (off for days and only used once in awhile)

What kind of data center equipment would I be looking at and how bad of a power bill? I've occasionally seen things like quad opteron motherboards for a pittance that look like they can take 16 core cpu's(!) but I don't know if the efficiency-per-task is any better than a newer quadcore. I strongly prefer only powering up what I need when I need it/to be scalable.

I'll check out Reddit too, thanks for the suggestion.

Hopefully if we can come up with a demo that impresses people, we might either officially start a business over it, or do a kickstarter/indiegogo. The problem is you have to have something to show and it's too early to do that. Even the "small business" angle is premature because the IRS wants to see a profit normally within 3 years, and looks at you real ugly if youre still not making a profit by 5. There is also little to save on 'taxes' when you're in a low bracket anyways... this is more like what I call a trickle business. We work at something we hope will eventually fund itself but I don't know if it will succeed in 3 years or not for 12 years because I can't control the variables. It takes what it takes, and since it's out of pocket its mostly minimize costs wherever it's not essential.

Mostly I just want the money to go to hard drives and LTO tapes since that is the unavoidable minimum cost to store a TB no matter what you do.
 
just as a heads up - there's a good community over on servethehome.com that has more experience with DIY and off-lease corporate gear (including deals, setup/config, etc...)

I don't refer folks off-site that often, but they seem like a pretty sharp group over there...

https://forums.servethehome.com/index.php
 
Who pays for equipment and power? Either me or some people throw in for at least the extra drives and equipment but I cover the power. Equipment would not necessarily be on 24/7 unless it's processing a job. Or might have separate servers that are always on vs intermittently on vs task specific. (off for days and only used once in awhile)

What kind of data center equipment would I be looking at and how bad of a power bill? I've occasionally seen things like quad opteron motherboards for a pittance that look like they can take 16 core cpu's(!) but I don't know if the efficiency-per-task is any better than a newer quadcore. I strongly prefer only powering up what I need when I need it/to be scalable.

Hardware is cheap - the challenge is the OPEX...

Good example - Monthly Operations:

Cooling - $400/month (this includes power and maint)
Power - $200/month (per rack)

If one has a standby generator and UPS - figure $500/month there, and if one has remote hands/management oversite (e.g. operations staff) at 20 hours/month (figure $1500/month there) as a shared resource (with 24/7 service level agreement) - power is somewhat of a variable - PHX is cheaper than ORD for example... so as you can see, costs can run up pretty quick.

Hardware gets pretty cheap at that point - prices are somewhat variable depending on the tier class of data center, and there's always hosting options in the cloud (which is part of the financial motivation towards folks like Rackspace, Softlayer, AWS, and others - then one just leases the space in the rack at a fixed rate (they provide everything, one just pours in the application stack).

Then the other variable here is connectivity itself - whether from the hosted provider, or dropping in from Level3, ATT, Verizon, etc...
 
Hardware is cheap - the challenge is the OPEX...

Good example - Monthly Operations:

Cooling - $400/month (this includes power and maint)
Power - $200/month (per rack)

service level agreement) - power is somewhat of a variable - PHX is cheaper than ORD for example... so as you can see, costs can run up pretty quick.

Then the other variable here is connectivity itself - whether from the hosted provider, or dropping in from Level3, ATT, Verizon, etc...

Well the bigger question is what do I need this gear for and what job is it doing. :) I'm not against buying used anything if it has a purpose and makes financial sense, but $600/month for an old server to do the same job that a new one can do in 1/10th the time and ongoing costs obviously i'll buy new then. I'm looking for out of the box thinking to come up with solutions, not just an excuse to have Big Iron in the basement. :) Like I don't need 24 hard drive chassis stuff and buying a bunch of old 72gig 10k hard drives still wont do what an SSD does. It's about picking and choosing what specific parts will make all the difference.

Some older parts could be very useful - i'm all for old Infiniband or FibreChannel HBA's for instance as a cheaper way than 10gig ethernet seems to currently cost, since i'm not sure when 10gigE is going to become ubiquitous or siwtches/cards dropped to low enough prices to matter.

What is PHX and ORD, a google search with that doesn't yield an obvious answer... again i'm not looking for "servers for the purpose of having servers", nor of learning to be a sysadmin for the purpose of being a sysadmin or running anybody else's network but my own - i'm trying to pick my useful acronyms carefully. Desktop virtualization yes, NAS and SAN yes, other stuff it's more asking what else should I be learning about and why?

Connectivity is just home no caps broadband but this isn't meant for remote virtual sessions much, mostly local usage. Each person will work on their individual issues except for occasional sharing of data or videoconferencing, if a large amount of data has to be moved it will be moved by sneakernet and a box of LTO6 tapes.

I'm looking for low cost workarounds and even "hacks" if you want to call them (in the dumb crude way everything is referred to as a 'hack' nowadays, ie lifehack and other stupid terms I dont like but are sort of accurate lacking another word right now) in the sense of simply using a different workflow process. Looking at what an ideal costs to implement (lets say total bandwidth needed for multiple users) and figuring out a way to remove the most expensive bottlenecks. (multiple SAN's so you can either slave one SAN per heavy user, or multiple SAN's to one user depending on whats going on) Figuring out ways to counter the tendency of "enterprise solutions" to randomly add zeroes on the right side of the equation with geometrically growing complexity, prices, and maintenance hassles.

Everything i'm doing just sort of starts at the "home power user" side of the spectrum while attempting to map out a future growth map that leads into the lower end of viable enterprise hardware so that totally reinventing the wheel is not necessary. So that I am not painted into a corner. Like if I set up a home SAN with 4gig FibreChannel (literally the article that led me to create an account here was the one a few years back about exactly that) that's not the fastest but can easily be upgraded in the future on both sides, both faster drives/more SSD's and faster networking.

Or if the workaround is multiple NAS boxes with only 5 drives and 32TB usable storage (or whatever the current max FreeBSD is a totally known "works well up to this" size for with ZFS) then just slapping more boxes at the problem if it has to be all online all the time. It doesn't have to be a single volume of 300TB or 1PB - it might get a little ungainly to have ten servers for 320TB but so is ANY hardware for 320TB in the future, likely moreso with all the redundancy, special RAID cards to spool up drives separately to avoid big power spikes, etc. Having ten consumer UPS's one for each might either be ungainly - or smart - depending how it works. If the economics suggests buy one big ex-commercial UPS and plug all ten servers into it i'll do that instead. Nothing is fixed as a "have to". Just an overall best value, lowest total overhead cost per-drive/TB, not too bad of inconvenience, and a workable "bridge" solution to bring me from where I am, to where the data can be ported back out to more appropriate modernized hardware in the future.

Everything has a sweet spot. Every bit of hardware, every strategy, etc. I'm just trying to find the sweet spot of everything and stick it all together with some degree of scalability and future-proofness. While avoiding the painting myself into a corner tendency of past solutions - ie I thought I could simply buy external USB drives and stick them on the shelf for the past video storage. Until two drives got dropped (which mirrored each other - losing ALL data on both instantly - LTO6 tapes wont die like that), until I discovered silent corruption all over without ever seeing a warning from windows (even sometimes on data just written and checked five minutes later - too late to already correct since it wasnt a verified/secure file move), and later a power surge on the PC wiping out multiple drives at once which i'd already RAID 1'ed internally thinking that was the solution. In each case I was trying to do the right thing of backups, mirroring, and everything but it still wasn't enough. I'm beyond the normal home use level but not to the "I can hire a pro to know all this" level.

The biggest thing is what do I do in what order right now, this is all stuff I apparently have to learn anyways, even if just for expensive hobbies and demanding home projects. And since I haven't yet started ANYTHING except talking about "the new perfect system" anything is still on the table, for telling people more pro than me "I plan to do this! Whats the problems with my strategy?" while I revise, reconsider, and look at alternatives. What i'm starting is still within the range of 'small net builder' probably, it's just at some point it will expand beyond it and it's mostly knowing how that will work and planning for that. (like random example - someone else tried to sell me on Drobo, until I learned how slow their USB2 transfer rates would be to migrate files back out if I wanted to totally move data off a server, and how if their proprietary solutions ever fail there's no way to restore data from the drives) I too want the ideal of an 'appliance' I just set and forget and do little to maintain except occasional throw the money it needs as an offering without it being an endless hole either. But I still see no off the shelf solutions for anything I yet want to do, not even other NAS boxes for instance.
 
If one wants to set up a lab to tinker and learn - eBay is definitely a resource, and even Craigslist sometimes, depending on area (some cities have a lot of data centers/etc)...

Old data center gear is easy to come by, as tech improves, and lease cycles/support contracts usually go on a 36 month cycle.

One could do a single rack setup - 24 port top of rack switch, a couple of DL380G6's and a storage cluster, along with power conditioning/UPS... shop smart and one could do a single rack for under $10K...

The big thing then is the power and cooling - and power is the big one..

(btw - ORD/PHX are airport codes -- Chicago and Phoenix)
 
Going to be expensive not shoestring. I think I read that you don't want failure, so redundancy $$. To move that amount of data efficiently 10Gb/s switch. 300TB is 600TB+your tape backup, unless you wing it. If you use your basement, you might want a static IP from your ISP with dedicated firewall appliance so your community buddies can access remotely . Hey free heat in the wintertime! I think your biggest expense will be the drives and the UPS(s), next your cabinet and racks (second hand). 10TB disks are $400, so $24,000 in drives. 16 3.5" drives in a 3U chassis, thats 4 of those- example- SUPERMICRO SuperChassis CSE-936E26-R1200B , thats 6K. Then small but fast switch, server with your flavor of a NAS distro. ZFS. Maybe put your $2K tape backup system in another location like your garage and split the rack every other, each mirroring the one above it like 1a-2a;1b-2b. Then 9000watt ups $4k. Small 10 Gb's switch $1K. Then the server- maybe $400 (Asus or Gigabyte mainboard...) So IMO I think you should budget for $40K +, if your using all the drives at the same time (platters spinning) you will need cooling too I think. After the equipment its just a your monthly ISP, and electric bill. I would like to know how you make out in a couple years time. best wishes.
Edit: forgot maybe 4X X99 Extreme 11 MB for the 3U's ($3K)
 
Last edited:
Going to be expensive not shoestring.

I'm not sure that fits my planning though it's okay if you didn't read all of my big wall of text. :) I'll try to be more concise...

Right now i'll be starting small and looking to grow into larger. 300TB doesn't happen overnight. It's years off. But better to plan with the outcome in mind.

Tape will be the storage of choice for data in the meanwhile. Load 'work data' on the rack, process, output back to tape. Yes i'd like to have everything on realtime drives but it's not necessary and can work around "for now".

Starting maybe 20-32TB or so of online NAS working storage, maybe 2TB of SSD SAN storage (RAID array for speed), plus an LTO6 tape drive. Those aren't hard figures - I read somewhere of some commercial software free for home use but only up to 18-19TB - that's not off the table either.

I dont need rack gear for the purpose of having rack gear. I need specific solutions to specific problems.

My biggest issue is verifying what my "specific problems" should be though. :) As well as "big picture" planning and clever insights/workarounds to keep costs down.

Minor inconvenience to save money is worth it since costs seem to geometrically increase over a certain level of complexity or monolithic integration. Like ATX cases over 8 or maybe 10 drives rapidly skyrocket in price, so do motherboards with more than the common max of RAM for the era if not a server board. So instead of a 16 drive 128GB ram 128TB FreeNAS server on pro hardware I might do two 8 drive 64GB ram 64TB total servers on consumer hardware. Also I expect 8/5 workdays or at worst 12/6 not 24/7.

Have an interest in 4gig FibreChannel and 10gig Infiniband links over 10gig Ethernet as there's still the tcp/ip overhead to worry about. Used cards notably cheaper last I checked - little savings per station adds up. Easy to upgrade to newer faster in the future. Nonroutable I believe may be a security plus - however the jury is out on this issue and i'm open to be swayed.

Other little things like this, little savings here, little savings there. Another example instead of one monolithic SAN, set up two separate ones for the planned two users who may max the pipe. But cross link them too so one user can use both SAN if not in use. Pretty sure I can do that with FC anyway.

Starting with normal backups, no insta-mirroring at first/too expensive for totally redundant everything out the door. But yes later, also assumed mirroring be easier with SAN and FC since I thought it was meant for uses like that.

I'm designing for progressive upgrades as money becomes available.
 
Last edited:
Good idea about growing it. Best way in my opinion too. You mentioned RAID, have a look here about DIY other idea when you start to get a lot of hard disks. Sounds like something I would do if I wanted to build a big storage depot. Yes I agree if there is limitations on the freenas type distro's maybe go with freebsd or some other linux flavor and setup a headless system. cheers
 
Well THE article that brought me to these boards was this one http://www.smallnetbuilder.com/nas/...n-fibre-channel-san-for-less-than-1000-part-1 - because it's a great example of doing what previously was an enterprise class solution on what comparably is a shoestring budget.

Building "NAS cubes" of a given size (anything 19-64TB probably) using either something like FreeNAS was another assumption. The cost overhead per terabyte is pretty low up to that size and thats my main goal - "most money to go to LTO tapes and hard drives" not overhead. I'd like to saturate gigabit ethernet while still being able to do some background duties (scrubbing etc) if possible. The SAN is for what bottlenecks trying to run from the NAS - that has to be much faster.

That MHDDFS thing seems interesting, i'm wondering if I could run that on top of a ZFS type NAS and just throw new NAS cubes at the problem. Again like the page says it's not a backup solution but that's okay. Backups will probably be LTO6 tape at first and later a second online work NAS when I can afford it. Same with the SAN - to start it's just a scratch drive, if it dies I lose some work but I can't afford a full mirror. But apparently SAN is well set up for doing mirroring and failover. (though I don't know offhand how this is accomplished, just two fibrechannel links to each box maybe?) So at some point we upgrade to realtime mirroring for no lost work.

Virtual desktop headless stuff i'm talking about in a separate thread (which doesn't have as much response yet). I've been wanting to do something like that anyways - i've had people wanting to LANparty at my house willing to leave spare hardware here that I can throw at 3d rendering tasks in a render farm at other times for instance, if I don't have to administer or load up a bunch more hard drives and can just boot one of two configurations (rendermode and LANparty mode) it's an example of one solution serving two purposes anyways. They have the spare mobos'n'stuff to borrow me if I can set them up and they can game on them.

I'm pretty either buzzword-heavy or acronym heavy here but I hope i'm using everything accurately, and i'm always curious "what else do I need to know about?" Like at one time I didn't even know what a SAN was or that it was different than a NAS. I didn't know fibrechannel stuff was cheap (was all interested in Infiniband though http://www.davidhunt.ie/infiniband-at-home-10gb-networking-on-the-cheap/ ) or that both systems could run without the expensive switches just direct patched. I'm hoping other experts in enterprise level solutions can give me articles, acronyms, terms to look up and look into, or tell me why i'd want to consider X for longer term solutions, or why I should learn about Y as a way to grow into that solution.

Mostly though everything has to scale up gracefully. I can't "afford" to do any of this instantly or even "the right way", it's just a series of progressive solutions where I fix the most important problems one by one in a series of rolling upgrades. It's just I want to be sure that whatever I started with still belongs there by the end or at least was the best way to move in that direction and I didn't artificially limit myself with a shortsighted early decision.


Btw, can that MHDDFS thing span multiple physical boxes as well do you know?

EDIT: actually here's another link http://blog.brianmoses.net/2016/06/building-a-cost-conscious-faster-than-gigabit-network.html and an example of why i'm publically brainstorming all this out. There he says Infiniband doesn't play very nice with FreeNAS, so that alone might force me to either choose IB or FreeNAS. I'm not sure if it works well with FibreChannel either so that might drive me to a NAS that does. Since both increase in speed and older hardware will always be available though i'm pretty sure if I find a NAS software that works with either or both well, that I can continue to upgrade that hardware in the future as we get faster and faster SSD's or even hard drives. I also assume (but could be wrong) that 10GigEthernet will always have bottlenecks IB/FC do not though someone in the know could talk me back to 10gig with the right sales pitch. :)
 
Last edited:
Hello Twice_Shy,

I responded in one of your other posts earlier today and then stumbled upon this this evening. I was an enterprise SAN engineer for almost 5 years. I hear you have a dream and a goal to create movies, perform digital video editing and gaming development. I can also see you're very determined to reach your goal and I wish to commend you.

I too was very determined when I decided to enter the world of Storage. Let me share with you how I entered the storage world. When I started as a SAN engineer the only thing I knew about storage was how to spell the word SAN. The 3 days before I had my interview I familiarized myself with Storage. I learned as much about fabric topologies and storage concepts as I could buy reading storage white papers. Going into my interview it was known by everyone that I had zero San experience however I was able to explain the architectural designs of a core-edge, edge-core-edge and mesh fabrics as well as the advantages and disadvantages of each. I also explained the purpose of a BCV (Business Continuity Volume) and how it functioned in the DMX. The rest of the interview was me charming them because I had a job offer a few days later. I then found myself supporting what was, at that time, the largest Brocade fabric in the world. In actuality it was 4 fabrics, 2 in data center A for local redundancy and 2 in data center B for local redundancy and for some replication purposes between the 2. 18 months later it became 8 Fabrics as we built out the original fabric to capacity and started building out another set of fabrics at each data center. A couple of years later I was taking care of all the SAN equipment installation (from the loading dock to having it production ready) for a handful of data centers in the Northeast and Midwest. I have put into production, storage frames that were just shy of 300TB in raw capacity.

I share this with you for 2 reasons.

Reason 1: Chase your dreams, be bold, be daring, take chances, don't give up on yourself or your goals. You hinted that you had something big going and it all fell apart, yet here you are going at it again. That is the determination you will need to succeed and I commend you.

Reason 2: If you want to learn what you need to know to support the project you are so determined you achieve, you will need the experience and certifications that I once held in order to be successful. Which at that point you may have the funds that your 100k+ storage job is paying you, to devote to your dream. Please don't read this as being snarky in anyway. I am serious in what I am saying.

In my opinion the equivalency of being able to change the clutch in a transmission is being able to replace a failed drive in a raid group. Sticking with the automobile analogy, what you need to learn is not how to replace a clutch, it is how to design a functional, reliable transmission.
 
Hi Twice Shy,

Pity you didn't start college a few years early, pity you don't study at a college in the Netherlands. Until a few years ago i taught IT sciences at a Community College in the Netherlands and i would have loved to turn your questions on this forum into an assignment for my students. As an assignments this has a lot to offer, a real client wanting a real solution, less than optimal budgetary constraints, forcing students to find creative solutions and so on....

Over here it is not strange for a nonprofit or small company (lacking the funds to hire a dozen consultants) to turn to a vocational college with questions such as yours. I do not know if this is customary over there but, fortune favors the bold, so why don't you hop over to the IT-faculty of your college?

Good Luck! b.
 
In reverse order, THANKS bernard for your suggestion, I had not considered that at all, and I forgot sometimes that tech colleges CAN be looking for "worthwhile projects" like that to pursue since they need something to learn about and try to solve anyways. I will definately be following up with that solution at some point. :)

To Deadeye, in the month plus since my original postings, I both stumbled across SnapRAID and was directed to ServeTheHome and have been picking brains there. I'll share a few things i've learned or/am trying to apply in the meanwhile:

I'm pretty sure i'm going with SnapRAID to at least get started. Not necessarily as the only NAS solution, either now or later, but it solves my most mission critical problems - preventing silent data corruption, handling individual drive failures more elegantly than simple mirroring, being utterly scaleable to "upgrading as I go", requiring no migration of data into or out of any proprietary format (both hardware RAID arrays, or at least all striping ie 0/5, and systems like ZFS mean taking the risk of putting data into a format for which you can no longer use normal drive recovery tools that exist for like NTFS/Ext3/Ext4), as well as other people already building fairly robustly sized arrays out of it (24 drives being not uncommon and 72TB plus) which when I previously investigated FreeNAS with ZFS it seemed rare anyone talked of over 32TB.

SnapRAID doesn't like lots of files (millions get a bit clunky) in a single array, and is not well suited for rapidly changing files (large files different every day like a saved virtual machine state) but it will let me throw hard drives at my problem with minimum overhead without having them corrupt the way I already have a bunch of drives which did on me with past footage. Furthermore because it operates over the file system instead of under the block level, it's possible for me to export the files to Ultrium tape, because the cheapest way for me to store 300TB especially if it doesn't have to be immediately worked on is to prepare it to shuffle straight to LTO6 which is what i'm planning to do. SnapRAID gives me the option to write a "parity tape" if I set it up right (its not made for this, but it can be made to do this) so I get a "Redundant Array of Inexpensive Tape" that will survive the breakage of one or more tapes upon recovery, so that I have another alternative besides mirror sets. (Yes I also hope to have mirror sets, but when money is tight I may be able to write only 2 copies of data, if any tape breaks I have only one more chance to not lose data - a parity tape or two give me a way around this, just like RAID6 lets me survive two drive failures)

I'm planning on using an SAS card and SAS Expander chassis. I do not yet know enough about SAS for my understanding and my asking of questions on other boards are wearing a few people out - I need an SAS expert whose brain I can pick a little to get up to speed. I'm wanting to use like older SAS Expanders, possibly even first gen stuff that often plugged into a PCI or PCIe slot for power as they are available for a song. No fancy hot swap racks or anything, just a bunch of thrift store ATX cases to hold the drives for now - mild inconvenience. Every dollar saved goes to buying Ultrium tapes and drives afterall right now. Planned max growout will be 48 drives - SnapRAID tolerates up to 6 drive failures with a recommended single parity per 7 data drives. I can already fill half that now with my old drives if they all still work, but the plan is to migrate data to newer bigger drives and keep the old ones around as work drives til they die.

Other parts of the network i'm still trying to figure out but i'm hoping that SnapRAID, Ultrium and SAS Expanders will at least let me fulfill my original minimum goal - to have a big cheap bit bucket, minimum cost overhead possible (over each tape and drive bought), can separately expand offline capacity (Ultrium) and online capacity (SnapRAID used to both prepare the tapes and be a NAS full of video to edit), and this lets me take a little more time with learning the next steps because I at least wont be wasting or losing opportunities anymore. (no more "hey we can use the mocap stage this weekend" but I have no place to put the data, since it all ended up bit-rotted on the old drives anyways and I dont know how to prevent the whole shoot just being a loss again)

As an aside I dont know if I ever planned to do this to get a storage job or even get certifications. :) I mean I suppose it could end up there... but I mostly planned to learned what I needed for my primary objective.


I will still want to hopefully pick brains and learn what the next step (the performance NAS/SAN) might look like. That will be an issue when the first stage planned SnapRAID server becomes a bottleneck. I'm hoping by the time that that happens, i'll know what to build and implement to augment that part of the editing, VFX rendering, 3d rendering and color grading passes if they end up too time consuming.
 
Hi Twice_Shy,

You're welcome ;-) Off topic (i know) but as your collective is in the business of designing video games (which a software-based product) did you check if you are eligible for MS- BizSpark? See: https://bizspark.microsoft.com/

Good luck!
 

Similar threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top