What's new

How can I tell if I/O issues I'm having when interacting with a SAS array are the drives or the HBA?

SolidSonicTH

Occasional Visitor
Basically I've been experiencing some issues where the process log when running file operations reports “The request could not be performed because of an I/O device error," as a result of interacting with my JBOD array. I've experienced this both when writing to the array with FastCopy and reading from it when trying to make backups via Kopia. In either case it seems to affect what happens to the data (either FastCopy just gives up writing that file or Kopia won't put that file into the snapshot).

The server itself is running Windows 11 and as I peruse the Event Viewer I notice many warnings under the "System" category that say access to a location was retried on various disks. However all the data that lives on the array is seemingly fine. If I write to the storage pool with Windows Explorer it sends the files over without stopping and when I read them back they aren't corrupted (and this includes checking on files that Kopia claims it couldn't read during the backup process). To me this suggests I should be able to perform tasks in either direction without it crapping itself.

The array is being aggregated using StableBIt DrivePool with file redundancy enabled and the physical media comprises eight 6TB 12 Gb/s SAS drives from HGST connected 4x2 to a Dell PERC H310 HBA running in IT passthrough mode that I got from a garage sale. The drives all read as perfectly healthy in HD Sentinel (with all of them reporting around 2500 days of uptime - this is not a 24/7 application I'm running them in as I shut down the server when not using it to preserve the HDDs' longevity and save power as I consider the server a "tier 2" cold storage data host, with the Kopia backup acting as "tier 3" in case the array were to fail and need to be reconstructed). I've ordered a new HBA (since I was planning on doing some work inside the thing anyway since I'm finally using it the way I intended and the HBA I'm currently using only operates at half the data rate the drives are capable of) but am I barking up the wrong tree here? I have noticed the array will go through cycles during long read/write processes where the activity will drop to nothing then resume again so I thought maybe that was causing the problem but I'm also not sure if this is just regular behavior when dealing with JBODs across a discrete HBA.
 
Last edited:
I am basing my comments on another HBA that can be cross-flashed to an LSI set of firmware.
I am assuming that the 'IT passthrough' mode is after performing a similar crossflash.

The first things that come to mind are 'cables' and/or 'cooling' of the card itself.
You will get 'I/O Errors' if the cables are faulty/failing or the card is running too hot.
These cards tend to run hot so it is a good idea to improve the ventilation of the card and/or check the thermal paste under the heatsink.

Mass produced cards, of all types, are often built using thermal pads, these don't always cover all the area under the heatsink.
Over time the thermal paste/pad can dry out. (Particularly if the card is running too hot anyway !!!)

Note:
Lifting the heatsink will 'break' the contact of the paste/pad.
It WILL require replacing ... don't try reusing the old paste/pad you WILL kill the card because of bad thermal contact (Overheating !!!)

Ideas:
1. Is there a header for a fan on the card ? If so you can add a small fan over the heatsink.
2. Check the thermal interface under the heatsink. (DO THIS *ONLY* IF YOU HAVE THERMAL PASTE/PAD TO *REPLACE* WHAT IS THERE)
3. If after adding additional cooling and/or new thermal paste/pad the same issues are present ... check cables.
4. Cables can degrade if they are overheated, particularly the card-based connections when the card is running very hot.
5. Use proper cables that are verified to be suitable for the LSI card you are emulating. Often cheaper than you think !!!
6. Last idea is check if there is a newer version of the LSI firmware for your card that has been VERIFIED to work.
(Some of the old firmware may have bugs).

Google is your friend ... look up known issues with re-flashed PERC cards.
 
The crossflash thing sounds familiar. I did do something while setting up the card that involved finagling it to work in that mode but I can't recall the exact process I went through with it.

Also can't remember if that piece of tape I put over one of the PCIe fingers was for power delivery or data.

Also also if the cables could potentially be the problem then changing the HBA could be of another benefit because I accidentally bought the wrong replacement mini-SAS cable when I was going to replace them initially (the PERC H310 has 8037 ports and I bought 8643 cables to replace them). So replacing the HBA will also force me to change cables as well (the new 12 Gb/s HBA I bought has 8643 ports).
 
Last edited:
The crossflash thing sounds familiar. I did do something while setting up the card that involved finagling it to work in that mode but I can't recall the exact process I went through with it.

Also can't remember if that piece of tape I put over one of the PCIe fingers was for power delivery or data.
What you did was something like this ====> https://techmattr.wordpress.com/201...-flashing-to-it-mode-dell-perc-h200-and-h310/

Note the fan on the heatsink !!!
Try cleaning out any dust that may be clogging up the fins of the heatsink.
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Members online

Back
Top