There are programs with which you can add any desired amount of redundancy to yo...

antonkochubey · 2025-11-25T10:14:39 1764065679

Unfortunately some SSD controllers plainly refuse to read data they consider corrupted, even if you have extra parity that could potentially restore corrupted data, your entire drive might refuse to read.

lazide · 2025-11-25T11:09:40 1764068980

Huh?

The issue being discussed is random blocks, yes?

If your entire drive is bricked, that is an entirely different issue.

jeremyvisser · 2025-11-25T11:32:07 1764070327

Here’s the thing. That SSD controller is the interface between you and those blocks.

If it decides, by some arbitrary measurement, as defined by some logic within its black box firmware, that it should stop returning all blocks, then it will do so, and you have almost no recourse.

This is a very common failure mode of SSDs. As a consequence of some failed blocks (likely exceeding a number of failed blocks, or perhaps the controller’s own storage failed), drives will commonly brick themselves.

Perhaps you haven’t seen it happen, or your SSD doesn’t do this, or perhaps certain models or firmwares don’t, but some certainly do, both from my own experience, and countless accounts I’ve read elsewhere, so this is more common than you might realise.

reactordev · 2025-11-25T12:36:44 1764074204

This is correct, you still have to go through firmware to gain access to the block/page on “disk” and if the firmware decides the block is invalid than it fails.

You can sidestep this by bypassing the controller on a test bench though. Pinning wires to the chips. At that point it’s no longer an SSD.

londons_explore · 2025-11-25T16:57:33 1764089853

The mechanism is usually that the SSD controller requires that some work be done before your read - for example rewriting some access tables to record 'hot' data.

That work can't be done because there is no free blocks. However, no space can be freed up because every spare writable block is bad or is in some other unusable state.

The drive is therefore dead - it will enumerate, but neither read nor write anything.

ThePowerOfFuet · 2025-11-26T07:49:14 1764143354

I don't think this is correct; it could read the flash block containing the [part of the] table in question, update it in memory, erase that block, then rewrite it into the same block.

cogman10 · 2025-11-25T19:46:16 1764099976

I really wish this responsibility was something hoisted up into the FS and not a responsibility of the drive itself.

It's ridiculous (IMO) that SSD firmware is doing so much transparent work just to keep the illusion that the drive is actually spinning metal with similar sector write performance.

immibis · 2025-11-25T23:40:04 1764114004

Linux supports raw flash, called an MTD device (memory technology device). It's often used in embedded systems. And it has MTD-native filesystems such as ubifs. But it's only really used in embedded systems because... PC SSDs don't expose that kind of interface. (Nor would you necessarily want them to. A faulty driver would quietly brick your hardware in a matter of minutes to hours)

bzzzt · 2025-11-26T15:09:46 1764169786

A buggy firmware will brick an SSD and block every option for recovering at least part of the data.

bzzzt · 2025-11-26T15:10:56 1764169856

Seems like the approach Apple is taking by soldering storage directly on the mainboard or using proprietary modules like in the Mac mini.

adrian_b · 2025-11-26T09:28:42 1764149322

When only a number of 4 kB blocks cannot be read, if the amount of affected data is less than the amount of added redundancy the archive file can still be repaired.

For instance, if you have a 40 GB backup archive with 10% redundancy, 4 GB of data, i.e. one million 4 kB data blocks can be unreadable and you can still repair the archive and recover the complete content.

It is true that the entire SSD or HDD can become bricked. The solution for this, as I have already written in my previous comment, is to duplicate any SSD/HDD used for archival purposes, which I always do.

lazide · 2025-11-25T15:14:17 1764083657

Yes, and? HDD controllers dying and head crashes are a thing too.

At least in the ‘bricked’ case it’s a trivial RMA - corrupt blocks tend to be a harder fight. And since ‘bricked’ is such a trivial RMA, manufacturers have more of an incentive to fix it or go broke, or avoid it in the first place.

This is why backups are important now; and always have been.

mort96 · 2025-11-25T17:26:01 1764091561

We're not talking about the SSD controller dying. The SSD controller in the hypothetical situation that's being described is working as intended.

lazide · 2025-11-26T12:58:04 1764161884

Not as far as I can tell, where intended is ‘as any user would reasonably expect’. Bricking the drive (can’t even read) because of too many errors is not what most users would ever want.

Some would (enterprise maybe), but even then they’d want deterministic data deletes too, which doesn’t sound like are happening.

mort96 · 2025-11-26T18:18:48 1764181128

You can argue that controllers shouldn't behave that way. But they do, it's not a bug, and it's not a dead controller. It's a perfectly functional controller's response to dead blocks.

lazide · 2025-11-26T21:47:10 1764193630

Cite? By definition it appears to not meet the definition of ‘functional’.

coldtea · 2025-11-27T08:20:32 1764231632

The definition of functional in the context of the discussion is that in works in the way the manufacture explicitly designed it work, in a standard industry practice fashion, not as an unforeseen bug or malfunction.

Not some abstract notion.

lazide · 2025-11-27T13:33:56 1764250436

So not enumerating as a drive, and not allowing you to read even valid blocks is ‘working’?

coldtea · 2025-11-27T23:41:00 1764286860

Yes, same as a facility self-destructing, if it was programmed to do so, is working as per its spec.

lazide · 2025-11-28T12:00:12 1764331212

And what spec requires that? I have yet to see one.

coldtea · 2025-11-28T16:22:33 1764346953

The manufacturers'.

lazide · 2025-11-30T00:51:23 1764463883

Cite? I have yet to see that actually documented anywhere, and you keep avoiding actually referring to one either.

asdefghyk · 2025-11-26T20:35:09 1764189309

RE "....This is why backups are important now; and always have been..."

Still a big problem if backup is to the "..same technology..."

lazide · 2025-11-27T13:34:15 1764250455

That’s why 3-2-1 is not just a good idea.

casenmgreen · 2025-11-25T20:55:30 1764104130

Thank you for this.

I had no knowledge of pax, or that par was an open standard, and I care about what they help with. Going to switch over to using both in my backups.

adrian_b · 2025-11-26T09:36:01 1764149761

For handling pax archives, I recommend the "libarchive" package, which is available in many Linux distributions, even if it originally comes from FreeBSD.

Among other utilities, it installs the "bsdtar" program, which you can use in your scripts like this:

  bsdtar --create --verbose --format=pax --file="${DIRECTORY}".pax "${DIRECTORY}" || exit

And for extraction:

  bsdtar --extract --preserve-permissions --verbose --file="${DIRECTORY}".pax

The bsdtar program has options for compressing and/or encrypting the archives, for the case when you do not want to use directly other external programs.

"par2create" creates multiple files from the (normally compressed and encrypted) archive file, for storing the added redundancy. I make a directory where I move those files, then I use a second time bsdtar (obviously without any compression or encryption) to aggregate those files in a single archive with redundancy.

The libarchive package can also be taken directly from:

https://github.com/libarchive/libarchive

"libarchive" handles correctly all kinds of file metadata, e.g. extended file attributes and high-resolution file timestamps, which not all archiving utilities do. Many Linux utilities, with the default command-line options or when they have not been compiled from their source with adequate compilation options, which happens in some Linux distributions, may silently lose some of the file metadata, when copying, moving or archiving.

compsciphd · 2025-11-26T14:02:35 1764165755

there's no reason that you have to create multiple files for par2 if you are storing the recovery data with the protected data. It only was split into files of varying size due to its source in protecting usenet posted binaries to allow users to not have to download the entire recovery data when they only needed a portion.

mywittyname · 2025-11-25T20:32:34 1764102754

This is fine, but I'd prefer an option to transparently add parity bits to the drive, even if it means losing access to capacity.

Personally, I keep backups of critical data on a platter disk NAS, so I'm not concerned about losing critical data off of an SSD. However, I did recently have to reinstall Windows on a computer because of a randomly corrupted system file. Which is something this feature would have prevented.