lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 20 May 2008 12:02:53 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Jamie Lokier <jamie@...reable.org>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Eric Sandeen <sandeen@...hat.com>, linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes

On Tuesday 20 May 2008, Jamie Lokier wrote:
> Chris Mason wrote:
> > On Sunday 18 May 2008, Andi Kleen wrote:
> > > Andrew Morton <akpm@...ux-foundation.org> writes:
> > > > On Fri, 16 May 2008 14:02:46 -0500
> > > >
> > > > Eric Sandeen <sandeen@...hat.com> wrote:
> > > >> A collection of patches to make ext3 & 4 use barriers by
> > > >> default, and to call blkdev_issue_flush on fsync if they
> > > >> are enabled.
> > > >
> > > > Last time this came up lots of workloads slowed down by 30% so I
> > > > dropped the patches in horror.
> > >
> > > Didn't ext4 have some new checksum trick to avoid them?
> >
> > I didn't think checksumming avoided barriers completely.  Just the
> > barrier before the commit block, not the barrier after.
>
> A little optimisation note.
>
> You don't need the barrier after in some cases, or it can be deferred
> until a better time.  E.g. when the disk write cache is probably empty
> (some time after write-idle), barrier flushes may take the same time
> as NOPs.

I hesitate to get too fancy here, if the disk is idle we probably won't notice 
the performance gain.

>
> This sequence:
>
>     #1 write metadata to journal
>     #1 write commit block (checksummed)
>   BARRIER
>     #1 write metadata in place
>   ... time passes ...
>     #2 write metadata to journal
>     #2 write commit block (checksummed)
>   BARRIER
>     #2 write metadata in place
>   ... time passes ...
>     #3 write metadata to journal
>     #3 write commit block (checksummed)
>   BARRIER
>     #3 write metadata in place
>
> Can be rewritten as:
>
>     #1 write metadata to journal
>     #1 write commit block (checksummed)
>   ... time passes ...
>     #2 write metadata to journal
>     #2 write commit block (checksummed)
>   ... time passes ...
>     #3 write metadata to journal
>     #3 write commit block (checksummed)
>   ... time passes ...
>   BARRIER (probably instant).
>     #1 write metadata in place
>     #2 write metadata in place
>     #3 write metadata in place
>
> Provided some conditions hold.  All the metadata and all the journal
> writes being non-overlapping I/O ranges would be sufficient.

This is true, and would be a fairly good performance boost.  It fits nicely 
with the jbd trick of avoiding writes of a metadata block if a later 
transaction has logged it.

But, it complicates the decision about when you're allowed to dirty a metadata 
block for writeback.  It used to be dirty-after-commit and it would change to 
dirty-after-barrier.  I suspect that is some significant surgery into jbd.

Also, since a commit isn't really done until the barrier is done, you can't 
reuse blocks freed by the committing transaction until after the barrier, 
which means changes in the deletion handling code.  

Maybe I'm a wimp, but these are the two parts of write ahead logging I always 
found the most difficult.

>
> What's more, barriers can be deferred past data=ordered in-place data
> writes, although that's not always an optimisation.
>

It might be really interesting to have a 
i'm-about-to-barrier-find-some-io-to-run call.  Something along the lines of 
draining the dirty pages when the drive is woken up in laptop mode.  There's 
lots of fun with page lock vs journal lock ordering, but Jan has a handle on 
that I think.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ