lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100204033911.GE5332@discord.disaster>
Date:	Thu, 4 Feb 2010 14:39:11 +1100
From:	Dave Chinner <david@...morbit.com>
To:	tytso@....edu, Christoph Lameter <cl@...ux-foundation.org>,
	Andi Kleen <andi@...stfloor.org>,
	Miklos Szeredi <miklos@...redi.hu>,
	Alexander Viro <viro@....linux.org.uk>,
	Christoph Hellwig <hch@...radead.org>,
	Christoph Lameter <clameter@....com>,
	Rik van Riel <riel@...hat.com>,
	Pekka Enberg <penberg@...helsinki.fi>,
	akpm@...ux-foundation.org, Nick Piggin <nickpiggin@...oo.com.au>,
	Hugh Dickins <hugh@...itas.com>, linux-kernel@...r.kernel.org
Subject: Re: inodes: Support generic defragmentation

On Wed, Feb 03, 2010 at 10:07:36PM -0500, tytso@....edu wrote:
> On Thu, Feb 04, 2010 at 11:34:10AM +1100, Dave Chinner wrote:
> > What it comes down to is that the slab has two states for objects -
> > allocated and free - but what we really need here is 3 states -
> > allocated, unused and freed. We currently track unused objects
> > outside the slab in LRU lists and, IMO, that is the source of our
> > fragmentation problems because it has no knowledge of the spatial
> > layout of the slabs and the state of other objects in the page.
> > 
> > What I'm suggesting is that we ditch the external LRUs and track the
> > "unused" state inside the slab and then use that knowledge to decide
> > which pages to reclaim.
> 
> Or maybe we need to have the way to track the LRU of the slab page as
> a whole?  Any time we touch an object on the slab page, we touch the
> last updatedness of the slab as a hole.

Yes, that's pretty much what I have been trying to describe. ;)
(And, IIUC, what I think Nick has been trying to describe as well
when he's been saying we should "turn reclaim upside down".)

It seems to me to be pretty simple to track, too, if we define pages
for reclaim to only be those that are full of unused objects. i.e.
the pages have the two states:

	- Active: some allocated and referenced object on the page
		=> no need for LRU tracking of these
	- Unused: all allocated objects on the page are not used
		=> these pages are LRU tracked within the slab

A single referenced object is enough to change the state of the
page from Unused to Active, and when page transitions from
Active to Unused is goes on the MRU end of the LRU queue.
Reclaim would then start with the oldest pages on the LRU....

> It's actually more complicated than that, though.  Even if no one has
> touched a particular inode, if one of the inode in the slab page is
> pinned down because it is in use,

A single active object like this would the slab page Active, and
therefore not a candidate for reclaim. Also, we already reclaim
dentries before inodes because dentries pin inodes, so our
algorithms for reclaim already deal with these ordering issues for
us.

...

> And of course, if the inode is pinned down because it is opened and/or
> mmaped, then its associated dcache entry can't be freed either, so
> there's no point trying to trash all of its sibling dentries on the
> same page as that dcache entry.

Agreed - that's why I think preventing fragemntation caused by LRU
reclaim is best dealt with internally to slab where both object age
and locality can be taken into account.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ