linux-kernel - Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200805172048.34455.chris.mason@oracle.com>
Date:	Sat, 17 May 2008 20:48:33 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Theodore Tso <tytso@....edu>, Eric Sandeen <sandeen@...hat.com>,
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes

On Friday 16 May 2008, Andrew Morton wrote:
> On Fri, 16 May 2008 20:20:30 -0400 Theodore Tso <tytso@....edu> wrote:
> > On Fri, May 16, 2008 at 11:53:04PM +0100, Jamie Lokier wrote:
> > > > > If you just want to test the block I/O layer and drive itself,
> > > > > don't use the filesystem, but write a program which just access the
> > > > > block device, continuously writing with/without barriers every so
> > > > > often, and after power cycle read back to see what was and wasn't
> > > > > written.
> > > >
> > > > Well, I think it is worth testing through the filesystem, different
> > > > journaling mechanisms will probably react^wcorrupt in different ways.
> > >
> > > I agree, but intentional tests on the block device will show the
> > > drives characteristcs on power failure much sooner and more
> > > consistently.  Then you can concentrate on the worst drivers :-)
> >
> > I suspect the real reason why we get away with it so much with ext3 is
> > that the journal is usually contiguous on disk, hence, when you write
> > to the journal, it's highly unlikely that commit block will be written
> > and the blocks before the commit block have not.
>
> yup.  Plus with a commit only happening once per few seconds, the time
> window for a corrupting power outage is really really small, in
> relative terms.  All these improbabilities multiply.

Well, the barriers happen like so (even if we actually only do one barrier in 
submit_bh, it turns into two)

write log blocks
flush #1
write commit block
flush #2
write metadata blocks

I'd agree with Ted, there's a fairly small chance of things get reordered 
around flush #1.  flush #2 is likely to have lots of reordering though.  It 
should be easy to create situations where the metadata for a transaction is 
written before the log blocks ever see the disk.

EMC did a ton of automated testing around this when Jens and I did the initial 
barrier implementations, and they were able to trigger corruptions in fsync 
heavy workloads with randomized power offs.  I'll dig up the workload they 
used.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/