lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100130192623.GE788@thunk.org>
Date:	Sat, 30 Jan 2010 14:26:23 -0500
From:	tytso@....edu
To:	Christoph Lameter <cl@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Dave Chinner <david@...morbit.com>,
	Miklos Szeredi <miklos@...redi.hu>,
	Alexander Viro <viro@....linux.org.uk>,
	Christoph Hellwig <hch@...radead.org>,
	Christoph Lameter <clameter@....com>,
	Rik van Riel <riel@...hat.com>,
	Pekka Enberg <penberg@...helsinki.fi>,
	akpm@...ux-foundation.org, Nick Piggin <nickpiggin@...oo.com.au>,
	Hugh Dickins <hugh@...itas.com>, linux-kernel@...r.kernel.org
Subject: Re: inodes: Support generic defragmentation

On Fri, Jan 29, 2010 at 02:49:42PM -0600, Christoph Lameter wrote:
> This implements the ability to remove inodes in a particular slab
> from inode caches. In order to remove an inode we may have to write out
> the pages of an inode, the inode itself and remove the dentries referring
> to the node.

How often is this going to happen?  Removing an inode is an incredibly
expensive operation.  We have to eject all of the pages from the page
cache, and if those pages are getting a huge amount of use --- say,
those corresponding to some shared library like libc --- or which have
a huge number of pages that are actively getting used, the thrashing
that is going to result is going to enormous.

There needs to be some kind of cost/benefit analysis done about
whether or not this is worth it.  Does it make sense to potentially
force hundreds and hundreds of megaytes of pages to get thrashed in
and out just to recover a single 4k page?  In some cases, maybe yes.
But in other cases, the results could be disastrous.

> +/*
> + * Generic callback function slab defrag ->kick methods. Takes the
> + * array with inodes where we obtained refcounts using fs_get_inodes()
> + * or get_inodes() and tries to free them.
> + */
> +void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
> +{
> +	struct inode *inode;
> +	int i;
> +	int abort = 0;
> +	LIST_HEAD(freeable);
> +	int active;
> +
> +	for (i = 0; i < nr; i++) {
> +		inode = v[i];
> +		if (!inode)
> +			continue;

In some cases, it's going to be impossible to empty a particular slab
cache page.  For example, there may be one inode which has pages
locked into memory, or which we may decide (once we add some
intelligence into this function) is really not worth ejecting.  In
that case, there's no point dumping the rest of the inodes on that
particular slab page.   

> +		if (inode_has_buffers(inode) || inode->i_data.nrpages) {
> +			if (remove_inode_buffers(inode))
> +				/*
> +				 * Should we really be doing this? Or
> +				 * limit the writeback here to only a few pages?
> +				 *
> +				 * Possibly an expensive operation but we
> +				 * cannot reclaim the inode if the pages
> +				 * are still present.
> +				 */
> +				invalidate_mapping_pages(&inode->i_data,
> +								0, -1);

> +		}

I do not thing this function does what you think it does....

"invalidate_mapping_pages() will not block on I/O activity, and it
will refuse to invalidate pages which are dirty, locked, under
writeback, or mapped into page tables."

So you need to force the data to be written *first*, then get the
pages removed from the page table, and only then, call
invalidate_mapping_pages().  Otherwise, this is just going to
pointlessly drop pages from the page cache and trashing the page
cache's effectiveness, without actually making it possible to drop a
particular inode if it is being used at all by any process.

Presumably then the defrag code, since it was unable to free a
particular page, will go on pillaging and raping other inodes in the
inode cache, until it can actually create a hugepage.  This is why you
really shouldn't start trying to trash an inode until you're
**really** sure it's possible completely evict a 4k slab page of all
of its inodes.

	   	       	  	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ