linux-kernel - Re: [PATCH 7/8] zswap: add to mm/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50E4588E.6080001@linux.vnet.ibm.com>
Date:	Wed, 02 Jan 2013 07:55:58 -0800
From:	Dave Hansen <dave@...ux.vnet.ibm.com>
To:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
CC:	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nitin Gupta <ngupta@...are.org>,
	Minchan Kim <minchan@...nel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Robert Jennings <rcj@...ux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@...ibm.com>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <jweiner@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: Re: [PATCH 7/8] zswap: add to mm/

On 01/01/2013 09:52 AM, Seth Jennings wrote:
> On 12/31/2012 05:06 PM, Dan Magenheimer wrote:
>> A second related issue that concerns me is that, although you
>> are now, like zcache2, using an LRU queue for compressed pages
>> (aka "zpages"), there is no relationship between that queue and
>> physical pageframes.  In other words, you may free up 100 zpages
>> out of zswap via zswap_flush_entries, but not free up a single
>> pageframe.  This seems like a significant design issue.  Or am
>> I misunderstanding the code?
> 
> You understand correctly.  There is room for optimization here and it
> is something I'm working on right now.

It's the same "design issue" that the slab shrinkers have, and they are
likely to have some substantially consistently smaller object sizes.

>> A third concern is about scalability... the locking seems very
>> coarse-grained.  In zcache, you personally observed and fixed
>> hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
>> Doesn't zswap's tree_lock essentially use a single tree (per
>> swaptype), i.e. no scalability?
> 
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache.  It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.

Lock hold times don't often dominate lock cost these days.  The limiting
factor tends to be the cost of atomic operations to bring the cacheline
over to the CPUs acquiring the lock.

> Also, I've done some lockstat checks and the zswap tree lock is way
> down on the list contributing <1% of the lock contention wait time on
> a 4-core system.  The anon_vma lock is the primary bottleneck.

4 cores these days is awfully small.  Some of our fellow colleagues at
IBM might be a _bit_ concerned if we told them that we were using a
4-core non-NUMA system and extrapolating lock contention from there. :)

It's curious that you chose the anon_vma lock, though.  It can only
possibly show _contention_ when you've got a bunch of CPUs beating on
the related VMAs.  That contention disappears in workloads that aren't
threaded, so it seems at least a bit imprecise to say anon_vma lock is
the primary bottleneck.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/