linux-ext4 - Re: semi-stable page writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121030234331.GH29378@dastard>
Date:	Wed, 31 Oct 2012 10:43:31 +1100
From:	Dave Chinner <david@...morbit.com>
To:	"Darrick J. Wong" <darrick.wong@...cle.com>
Cc:	Theodore Ts'o <tytso@....edu>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: semi-stable page writes

On Tue, Oct 30, 2012 at 01:40:37PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 30, 2012 at 09:01:22AM +1100, Dave Chinner wrote:
> > On Fri, Oct 26, 2012 at 03:19:09AM -0700, Darrick J. Wong wrote:
> > > Hi everyone,
> > > 
> > > Are people still annoyed about writes taking unexpectedly long amounts of tme
> > > due to the stable page write patchset?  I'm guessing yes...
> > 
> > I haven't heard anyone except th elunatic fringe complain
> > recently...
> > 
> > > I'm close to posting a patchset that (a) gates the wait_on_page_writeback calls
> > > on a flag that you can set in the bdi to indicate that you need stable writes
> > > (which blk_integrity_register will set);
> > 
> > I'd prefer stable pages by default (e.g. btrfs needs it for sane
> > data crc calculations), with an option to turn it off.
> > 
> > > (b) (ab)uses a page flag bit (PG_slab)
> > > to indicate that a page is actually being sent out to disk hardware; and (c)
> > 
> > I don't think you can do that. You can send slab allocated memory to
> > disk (e.g. kmalloc()d memory) and XFS definitely does that for
> > sub-page sized metadata. I'm pretty sure that means the PG_slab
> > flag is not available for (ab)use in the IO path....
> 
> I gave up on PG_slab and declared my own PG_ bit.  Unfortunately, atm I can't
> remember which bit of code marks the page ptes so that they have to go back
> through page_mkwrite, where we can trap the write.  Hopefully for a shorter
> duration.

clear_page_dirty_for_io(), IIRC.

> Also, I was wondering -- is it possible to pursue a dual strategy?  If we can
> obtain a memory page without sleeping or causing any writeback, then use the
> page as a bounce buffer.  Otherwise, just wait like we do now.

Using bounce buffers for all IO is not a feasible solution. Way too
much overhead copying data, not to mention we are already suffering
from the problem of flusher threads going CPU bound trying to issue
enough IO to keep high bandwidth storage fully utilised...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html