linux-kernel - Re: [patch 10/21] buffer heads: Support slab defrag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20080520232444.8bff5ccf.akpm@linux-foundation.org>
Date:	Tue, 20 May 2008 23:24:44 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Evgeniy Polyakov <johnpol@....mipt.ru>
Cc:	David Chinner <dgc@....com>, Christoph Lameter <clameter@....com>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Mel Gorman <mel@...net.ie>, andi@...stfloor.org,
	Rik van Riel <riel@...hat.com>,
	Pekka Enberg <penberg@...helsinki.fi>, mpm@...enic.com
Subject: Re: [patch 10/21] buffer heads: Support slab defrag

On Wed, 21 May 2008 10:15:32 +0400 Evgeniy Polyakov <johnpol@....mipt.ru> wrote:

> On Tue, May 20, 2008 at 04:28:16PM -0700, Andrew Morton (akpm@...ux-foundation.org) wrote:
> > It's more than efficiency.  There are lots and lots of things we cannot
> > do in direct-reclaim context.
> > 
> > a) Can't lock pages (well we kinda sorta could, but generally code
> >    will just trylock)
> > 
> > b) Cannot rely on the inode or the address_space being present in
> >    memory after we have unlocked the page.
> > 
> > c) Cannot run iput().  Or at least, we couldn't five or six years
> >    ago.  afaik nobody has investigated whether the situation is now
> >    better or worse.
> > 
> > d) lots of deadlock scenarios - need to test __GFP_FS basically everywhere
> >    in which you share code with normal writeback paths.
> > 
> > Plus e), f), g) and h).  Direct-reclaim is a hostile environment. 
> > Things like b) are a real killer - nasty, subtle, rare,
> > memory-pressure-dependent crashes.
> 
> Which basically means we can not do direct writeback at reclaim time?..
> 

Well, we _can_, but doing so within the present constraints is delicate.

An implementation which locked all the to-be-written pages up front and
then wrote them out and which was careful not to touch the inode or
address_space after the last page is unlocked could work.

Or perhaps add a new lock to the inode and then in reclaim

a) lock a page on the LRU, thus pinning the address_space and inode.

b) take some new sleeping lock in the inode

c) unlock that page and now proceed to do writeback.  But still
   honouring !GFP_FS.

and teach the unmount code to take the per-inode locks too, to ensure
that reclaim has got out of there before zapping the inodes.  Perhaps a
per-superblock lock rather than per-inode, dunno.

But we won't be able to just dive in there and call the existing
writeback functions from within reclaim.  Because

a) callers can hold all sorts of locks, including implicit ones such
   as journal_start() and

b) reclaim doesn't have a reference on the page's inode, and the
   inode and address_space can vanish if reclaim isn't holding a lock
   on one of the address_space's pages.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/