linux-ext4 - Re: semi-stable page writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121030204037.GE19559@blackbox.djwong.org>
Date:	Tue, 30 Oct 2012 13:40:37 -0700
From:	"Darrick J. Wong" <darrick.wong@...cle.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	"Theodore Ts'o" <tytso@....edu>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: semi-stable page writes

On Tue, Oct 30, 2012 at 09:01:22AM +1100, Dave Chinner wrote:
> On Fri, Oct 26, 2012 at 03:19:09AM -0700, Darrick J. Wong wrote:
> > Hi everyone,
> > 
> > Are people still annoyed about writes taking unexpectedly long amounts of tme
> > due to the stable page write patchset?  I'm guessing yes...
> 
> I haven't heard anyone except th elunatic fringe complain
> recently...
> 
> > I'm close to posting a patchset that (a) gates the wait_on_page_writeback calls
> > on a flag that you can set in the bdi to indicate that you need stable writes
> > (which blk_integrity_register will set);
> 
> I'd prefer stable pages by default (e.g. btrfs needs it for sane
> data crc calculations), with an option to turn it off.
> 
> > (b) (ab)uses a page flag bit (PG_slab)
> > to indicate that a page is actually being sent out to disk hardware; and (c)
> 
> I don't think you can do that. You can send slab allocated memory to
> disk (e.g. kmalloc()d memory) and XFS definitely does that for
> sub-page sized metadata. I'm pretty sure that means the PG_slab
> flag is not available for (ab)use in the IO path....

I gave up on PG_slab and declared my own PG_ bit.  Unfortunately, atm I can't
remember which bit of code marks the page ptes so that they have to go back
through page_mkwrite, where we can trap the write.  Hopefully for a shorter
duration.

Also, I was wondering -- is it possible to pursue a dual strategy?  If we can
obtain a memory page without sleeping or causing any writeback, then use the
page as a bounce buffer.  Otherwise, just wait like we do now.  It looks as
though one could use __GFP_NORETRY | __GFP_NO_MEMALLOC to see if the allocator
can give out a page without having to run reclaim...?

--D
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@...morbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html