What's new

USB drive failure...or something worse?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

bengalih

Senior Member
I run entware off my USB with several applications on my RT-AC68U.

In the past 5-6 years I have had several failures of my USB drives. On one or two occurrences, these were due to disk corruption likely due to too many hard power offs and no fsck running, etc. In these cases I re-formatted/partitioned/built the drive and everything was fine.

Two of these occurrences the drives actually failed completely. I moved them over to my PC to test and they were inaccessible and needed to be replaced.
I am however now having a strange issue and don't know if it is the USB drive of perhaps something worse.

Essentially, over the past 10 days I have had 3-4 failures of the drive. My entware install (/opt) goes offline and the mount in inaccessible.
The first time it happened, I wasn't sure what went on, and due to speed I simply rebooted the router. It came back up and everything worked fine for about 24 hours.
The next time it happened all I had to do was unplug the USB drive and plug it back in. It remounted automatically and everything was running again...for about 24 hours.
After the 4th time I put the drive into my PC (using WSL on Windows) and checked the drive, it appeared to have some errors in e2fsck, but I had some initial issues mounting it and not sure if I caused these.

I ran some USB drive testing tools in Windows to test the drive - basically writing/verifying to the entire drive multiple times and didn't see a single error.
I went ahead and rebuilt the drive entirely. I restored a tar of my backed up configuration (similar process I have done in the past when I had complete drive failure).

The rebuilt drive ran fine for about 3 days and then fell offline again.

I don't have the entire syslog as it got overrun with other errors due to the drive being offline, but here is bulk of it:

Code:
Apr  3 00:24:49 kernel: usb 2-1: device descriptor read/64, error -71
Apr  3 00:24:50 kernel: usb 2-1: device not accepting address 5, error -71
Apr  3 00:24:51 kernel: usb 2-1: device not accepting address 6, error -71
Apr  3 00:25:12 kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Apr  3 00:25:12 kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr  3 00:25:12 kernel: end_request: I/O error, dev sda, sector 4989176
Apr  3 00:25:12 kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr  3 00:25:12 kernel: end_request: I/O error, dev sda, sector 4989200
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145:
Apr  3 00:25:12 kernel: JBD2: I/O error detected when updating journal superblock for sda1-8.
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detectedcomm conn_diag:
Apr  3 00:25:12 kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Apr  3 00:25:12 kernel: reading directory lblock 0previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_journal_start_sb:252: Detected aborted journal
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm udhcpc: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm watchdog: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_journal_start_sb:252: Detected aborted journal
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #131073: comm nginx: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm sed: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm amas_lib: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm preinit: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm cp: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm cp: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm touch: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm grep: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm sh: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm dhcpc_lease: reading directory lblock 0
Apr  3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm networkmap: reading directory lblock 0
Apr  3 00:25:12 kernel: usb 2-1: device descriptor read/64, error -71
Apr  3 00:25:13 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:13 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm chmod: reading directory lblock 0
Apr  3 00:25:13 kernel: usb 2-1: device descriptor read/64, error -71
Apr  3 00:25:13 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:13 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #2: comm [: reading directory lblock 0
...
Apr  3 00:25:19 kernel: usb 2-1: device not accepting address 17, error -71
Apr  3 00:25:20 kernel: usb 3-1: device descriptor read/64, error -62
...
Apr  3 00:25:27 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr  3 00:25:27 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm preinit: reading directory lblock 0
Apr  3 00:25:27 ovpn-server1[2457]: Options error: --dh fails with 'dh.pem': No such file or directory (errno=2)
...

It certainly looks like a failure/corruption of the EXT4 file system, and it could be the drive itself is failing and causing this corruption.
My concern is that there isn't something more insidious, like the USB port on the router being flaky, etc.

This was a 3.0 drive, and I have another 2.0 drive in there at the moment with another copy of the system. I'm hoping that after a week or so of running fine on that I can concede that the other drive is just bad.

However, I'm a little concerned because all testing on my Windows system reports no issues with this drive.

Has anyone seen anything like this before?
 
The drive went offline. All the subsequent error messages are a consequence of that and can be ignored as they are false. Try running the device in USB2 mode.
 
Seen failing USB drives? Yes.

Use an external m.2 enclosure and an SSD instead.
 
The drive went offline. All the subsequent error messages are a consequence of that and can be ignored as they are false. Try running the device in USB2 mode.
Right, the question is why did the drive go offline. Is it a bad drive, or something else?

This drive has been functioning fine for about a year now, so I don't think there is anything wrong with my configuration - it would seem to be a hardware issue.

At least one, if not both of the previous drives that failed were USB 2.0 drives. This was the first USB 3.0 drive I had in (and again it ran fine for a year).

I'm currently doing some more burn in on this drive on my Windows box to see if I can get it to fail.

I have a USB 2.0 drive currently in, but if it works that might only indicate it is a 2.0/3.0 issue and not that particular drive.

I assume you mean the GUI toggle option of "USB 2.0/USB 3.0"?
Does this option stick across all reboots, mounting/unmounting of the drive?

I was going to put this misbehaving drive into the USB 2.0 port instead, but again if it works there is that an indicator that it only works in 2.0 "mode" or that something is wrong with the 3.0 port.

I can definitely tell certain operations are quicker on the 3.0 port/drive, so would prefer to operate like that.
 
Seen failing USB drives? Yes.

Use an external m.2 enclosure and an SSD instead.
I've considered that...right now was just trying to avoid the cost as this has worked fine for me for well over 5 years (bar the few failures that were ultimately going to happen due to the nature of the flash memory).

My main issue is that in the past, all failures were easily recognizable. The drive failed in the router and could not be recovered, could not be reformatted or used in any way...totally dead drive. This drive, apart from falling offline like it has done multiple times in the last week does not appear to be a bad drive in any other respects. I have not seen this type of failure before.

I am currently running an endless stress test on my Windows machine on this drive. I'm hoping that I get a failure on it within 24 hours as that would at least make sense. Without that, there is seemingly no way to truly verify a drive is good or bad before using it :/
 
A flash drive is supposed to be 'perfect' if we accept the marketing lore.

One failure means total failure to me. There is no other perspective if you want a reliable/stable network.
 
Right, the question is why did the drive go offline. Is it a bad drive, or something else?
Unfortunately it's impossible to say. The error message is pretty generic. I believe -71 means "Protocol error" but if the router can't communicate with the device that doesn't mean much. That error message is often preceded by other error messages.
 
Unfortunately it's impossible to say. The error message is pretty generic. I believe -71 means "Protocol error" but if the router can't communicate with the device that doesn't mean much. That error message is often preceded by other error messages.
If I put this drive back in, I will try to keep a running syslog over night capturing errors (or otherwise script some way to copy out syslogs) so that I can hopefully get the root error.

I'll also hunt down another 3.0 drive (think I have one other) and try with that.

It's just frustrating because according (currently) to any tests I throw at this drive it appears to be fully functional, yet it is falling offline when connected to the router. If I knew for sure it was the drive I would just accept it and move on, as I know that USB drives have a tendency to burn out after a couple of years off the ASUS. I was just looking for any indicator that it might be something more to either prepare myself for an unsavory workaround (using the 2.0 port), or look to possibly having to replace the router.

thanks.
 
If I put this drive back in, I will try to keep a running syslog over night capturing errors (or otherwise script some way to copy out syslogs) so that I can hopefully get the root error.

I'll also hunt down another 3.0 drive (think I have one other) and try with that.

It's just frustrating because according (currently) to any tests I throw at this drive it appears to be fully functional, yet it is falling offline when connected to the router. If I knew for sure it was the drive I would just accept it and move on, as I know that USB drives have a tendency to burn out after a couple of years off the ASUS. I was just looking for any indicator that it might be something more to either prepare myself for an unsavory workaround (using the 2.0 port), or look to possibly having to replace the router.

thanks.

Maybe try the drive in a computer and run a surface disk check.
 
Maybe try the drive in a computer and run a surface disk check.
as stated ....
I ran some USB drive testing tools in Windows to test the drive - basically writing/verifying to the entire drive multiple times and didn't see a single error.
This is why I was concerned because the drive appears fine when plugged into my PC (and it also appears fine in the router...until its not).

I am currently running an endless stress test on my Windows machine on this drive. I'm hoping that I get a failure on it within 24 hours as that would at least make sense. Without that, there is seemingly no way to truly verify a drive is good or bad before using it :/
^^^ I'm doing this again right now and plan to keep it running until I see an error. If it runs without error for a few days, then for all intents and purposes this drive is good - so the fact that it is disconnecting from the router worries me that something on the router is malfunctioning.
 
Maybe the usb is or isn’t thermal throttling, I know some of the solid state flash drives get hot like this one. Stress test is a good idea.

SanDisk 256GB Extreme PRO USB 3.2 Solid State Flash Drive - SDCZ880-256G-GAM46 https://a.co/d/iPn2NMX
 
A flash drive is supposed to be 'perfect' if we accept the marketing lore.

One failure means total failure to me. There is no other perspective if you want a reliable/stable network.

No flash ever goes wasted, and thumb drives are usually at the end of the salvage train... a 32GB thumb drive might have a 128GB chip that has so many dead cells that it only passes QA at 32GB, so they hard write that capacity into the controller with the dead cell map contained within...

If I have a thumb drive that ever has an error, I bin it and move on to another one, because it's past the point of enough dead cells that error correction can't work properly any more.

I've had best luck with SanDisk, the worst with lexar and various noname brands (freebies from vendors, tradeshows, etc).

Problem with most thumb drives is that they tend to fail hard with loss of data - they're not really intended to be constantly mounted and used as a read/write storage medium.

As others have mentioned - SSD in a USB case, or just pick up a spinning rust drive, those likely will last a very long time.
 
I have the same experience too with SanDisk and Lexar. Good point about 128GB chips in 32GB finished products.
 
If I have a thumb drive that ever has an error, I bin it and move on to another one, because it's past the point of enough dead cells that error correction can't work properly any more.
...
Problem with most thumb drives is that they tend to fail hard with loss of data - they're not really intended to be constantly mounted and used as a read/write storage medium.
Well that's the issue. I have no proof that the thumb drive has an error. Just because the drive is disconnecting from the router does not mean the drive is bad (although that is the most likely cause). If I handed you this drive and told you I just bought it and you put it in your machine and ran 5 hours of stress testing on it, it would come up fine and you would have no reason to believe it bad.

I understand the overall issues with thumb drives, but they have tended to serve me well (albeit having to replace one every 2-3 years). I store no data of any value other than the entware installation/configuration and that is backed up nightly. So, if I ever have a failure I can simply restore to a new drive.

I'm having concerns that it is not the USB drive and instead the USB interface in the router that is going flaky. I have a UPS hooked up to the other USB port (use apcupsd on entware to monitor) and I just saw an incident logged where the UPS lost communication. This could be due to that USB port failing as well. I have some more logging going now, to try and capture errors and hopefully I will (or will not) see another issue in the next 48 hours.

If I do, I plan to reset the entire box and rebuild from scratch and test once more. It may be I have to replace it.
 
Your testing will never give you the proof you want because the testing is on something other than the router.

A reasonable person only needs one example of unreliability, to conclude that the USB-Router combo doesn't work anymore.

Whether that is the USB key or not misses the point.

Try another USB key or better yet, an SSD in an external enclosure, to verify that the router USB port is functioning correctly (at least with that new drive).

That other port may also be dropping connections because of the USB key too.
 
Well that's the issue. I have no proof that the thumb drive has an error. Just because the drive is disconnecting from the router does not mean the drive is bad (although that is the most likely cause). If I handed you this drive and told you I just bought it and you put it in your machine and ran 5 hours of stress testing on it, it would come up fine and you would have no reason to believe it bad.

I understand the overall issues with thumb drives, but they have tended to serve me well (albeit having to replace one every 2-3 years). I store no data of any value other than the entware installation/configuration and that is backed up nightly. So, if I ever have a failure I can simply restore to a new drive.

I'm having concerns that it is not the USB drive and instead the USB interface in the router that is going flaky. I have a UPS hooked up to the other USB port (use apcupsd on entware to monitor) and I just saw an incident logged where the UPS lost communication. This could be due to that USB port failing as well. I have some more logging going now, to try and capture errors and hopefully I will (or will not) see another issue in the next 48 hours.

If I do, I plan to reset the entire box and rebuild from scratch and test once more. It may be I have to replace it.

It could be failing only when it gets warm/hot.

Try using SD card formatter to do a complete overwrite format and see if it succeeds (if it accepts your usb drive but it usually does, even if it isn't an SD card) That will stress the drive and should replicate the issue if it is heat related, worst case it does a good wipe of the drive which may mark cells as bad and exclude them, and resets drive size to the correct value, etc. Can also use windows formatter and uncheck quick format, but that one isn't as good with flash. Could also just transfer a large file to it or do some disk bechmark utilities on it, anything to warm it up.

But seems like there's a very good chance the drive is just starting to fail. Their lifespan in this scenario is going to be short. You can get a small SSD used off ebay for $10 or so and an enclosure cheap as well.
 
Well that's the issue. I have no proof that the thumb drive has an error. Just because the drive is disconnecting from the router does not mean the drive is bad (although that is the most likely cause). If I handed you this drive and told you I just bought it and you put it in your machine and ran 5 hours of stress testing on it, it would come up fine and you would have no reason to believe it bad.

I understand the overall issues with thumb drives, but they have tended to serve me well (albeit having to replace one every 2-3 years). I store no data of any value other than the entware installation/configuration and that is backed up nightly. So, if I ever have a failure I can simply restore to a new drive.

I'm having concerns that it is not the USB drive and instead the USB interface in the router that is going flaky. I have a UPS hooked up to the other USB port (use apcupsd on entware to monitor) and I just saw an incident logged where the UPS lost communication. This could be due to that USB port failing as well. I have some more logging going now, to try and capture errors and hopefully I will (or will not) see another issue in the next 48 hours.

If I do, I plan to reset the entire box and rebuild from scratch and test once more. It may be I have to replace it.
You can literally get a new oversized 500GB Samsung SSD for $39... and a decent SSD enclosure for $12... all for the same price as a decent higher-end thumbdrive... I would definitely take the leap and hopefully enjoy a longer-lasting stable experience than with what you're dealing with now. ;)
 
You can literally get a new oversized 500GB Samsung SSD for $39... and a decent SSD enclosure for $12... all for the same price as a decent higher-end thumbdrive... I would definitely take the leap and hopefully enjoy a longer-lasting stable experience than with what you're dealing with now. ;)

If they're concerned with cost it can be done for $20 using decent brand used stuff or under $50 with all new stuff.
 
If they're concerned with cost it can be done for $20 using decent brand used stuff or under $50 with all new stuff.
Me likey shiny new stuff... especially stuff that is of good quality, tested and long-lasting. ;)
 
Me likey shiny new stuff... especially stuff that is of good quality, tested and long-lasting. ;)

Normally yes, but for example, I have an old laptop that the SSD finally hit its limit and died. The laptop was not worth putting more than $10 to $20 in, and I got an 850 EVO 250G off ebay for like $9 shipped. It had 7 or 8 TBW out of like 150 it is rated for. For something like this, that's a perfect use case for a used but not abused SSD. The laptop is still useful to me as it has a serial port and I use it for hooking up to cars and also in my lab for router console connections, so didn't want to trash it, but also didn't want to spend $50 on a new huge drive that would be of no benefit over an older one. I think the OP's use case fits that too, why get a brand new super fast and big SSD when you need a tiny, slow, older one (since the router can't keep up with even several year old SSDs anyway).

Or to put it another way, if cost is a factor and it comes down to spending $20 on a new USB drive or $20 on a used SSD and enclosure, I'd go for option B.
 
Similar threads

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top