lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241216193104.GB78919@mit.edu>
Date: Mon, 16 Dec 2024 14:31:04 -0500
From: "Theodore Ts'o" <tytso@....edu>
To: David Laight <David.Laight@...lab.com>
Cc: "'Nikolai Zhubr'" <zhubr.2@...il.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jack@...e.cz" <jack@...e.cz>
Subject: Re: ext4 damage suspected in between 5.15.167 - 5.15.170

On Mon, Dec 16, 2024 at 03:16:00PM +0000, David Laight wrote:
> ....
> > > The location of block allocation bitmaps never gets changed, so this
> > > sort of thing only happens due to hardware-induced corruption.
> > 
> > Well, unless e.g. some modified sectors start being flushed to random
> > wrong offsets, like in [1] above, or something similar.

Well in the bug that you referenced in [1], what was happening was
that data could get written to the wrong offset in the file under
certain race conditions.  This would not be the case of data block
getting written over some metadata block like the block group
descriptors.

Sectors getting written to the wrong LBA's do happen; there's a reason
why enterprise databases include a checksum in every 4k database
block.  But the root cause of that generally tends to be a bit getting
flipped in the LBA number when it is being sent from the CPU to the
Controller to the storage device.  It's rare, but when it does happen,
it is more often than not hardware-induced --- and again, one of those
things where RAID won't necessarily save you.

> Or cutting the power in the middle of SSD 'wear levelling'.
> 
> I've seen a completely trashed disk (sectors in completely the
> wrong places) after an unexpected power cut.

Sure, but that falls in the category of hardware-induced corruption.
There have been non-power-fail certified SSD which have their flash
translation metadata so badly corrupted that you lose everything
(there's a reason why professional photographers use dual SDcard
slots, and some may use duct tape to make sure the battery access door
won't fly open if their camera gets dropped).

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ