linux-ext4 - Re: data=journal busted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070216154246.7b9a643c.akpm@linux-foundation.org>
Date:	Fri, 16 Feb 2007 15:42:46 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Andreas Dilger <adilger@...sterfs.com>
Cc:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: data=journal busted

On Fri, 16 Feb 2007 16:31:09 -0700
Andreas Dilger <adilger@...sterfs.com> wrote:

> > I suspect we should resurrect and formalise my old
> > make-the-disk-stop-accepting-writes-when-a-timer-goes-off thing.  It was
> > very useful for stress-testing recovery.
> 
> We have a patch that we use for Lustre testing which allows you to set a
> block device readonly (silently discarding all writes), without the
> filesystem immediately keeling over dead like set_disk_ro.  The readonly
> state persists until the the last reference on the block device is dropped,
> so there are no races w.r.t. VFS cleanup of inodes and flushing buffers
> after the filesystem is unmounted.

Not sure I understand all that.

For this application, we *want* to expose VFS races, errors in handling
EIO, errors in handling lost writes, etc.  It's another form of for-developers
fault injection, not a thing-for-production.

The reason I prefer doing it from the timer interrupt is to toss more
randomness in there, avoid the possibility of getting synchronised
with application or kernel activity in some fashion.

I don't know if there's much value in that, but it provides peace-of-mind.

I'm now seeing reports that ordered-data is corrupting data and metadata,
as is data=writeback.  I'm hoping that the blame lies with the
allededly-battery-backed RAID controller, but it could be ext3.  Has anyone
actually done any decent recovery testing in the past half decade?
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html