[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26bb76b3-308e-404f-b2bf-3d19b28b393a@default>
Date: Wed, 2 Jan 2013 09:08:04 -0800 (PST)
From: Dan Magenheimer <dan.magenheimer@...cle.com>
To: Seth Jennings <sjenning@...ux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Nitin Gupta <ngupta@...are.org>,
Minchan Kim <minchan@...nel.org>,
Konrad Wilk <konrad.wilk@...cle.com>,
Robert Jennings <rcj@...ux.vnet.ibm.com>,
Jenifer Hopper <jhopper@...ibm.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <jweiner@...hat.com>,
Rik van Riel <riel@...hat.com>,
Larry Woodman <lwoodman@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org,
Dave Hansen <dave@...ux.vnet.ibm.com>
Subject: RE: [PATCH 7/8] zswap: add to mm/
> From: Seth Jennings [mailto:sjenning@...ux.vnet.ibm.com]
> Subject: Re: [PATCH 7/8] zswap: add to mm/
>
> > I am eagerly studying one of the details of your zswap "flush"
> > code in this patch to see how you solved a problem or two that
> > I was struggling with for the similar mechanism RFC'ed for zcache
> > (see https://lkml.org/lkml/2012/10/3/558). I like the way
> > that you force the newly-uncompressed to-be-flushed page immediately
> > into a swap bio in zswap_flush_entry via the call to swap_writepage,
> > though I'm not entirely convinced that there aren't some race
> > conditions there. However, won't swap_writepage simply call
> > frontswap_store instead and re-compress the page back into zswap?
>
> I break part of swap_writepage() into a bottom half called
> __swap_writepage() that doesn't include the call to frontswap_store().
> __swap_writepage() is what is called from zswap_flush_entry(). That
> is how I avoid flushed pages recycling back into zswap and the
> potential recursion mentioned.
OK, I missed that. Nice. I will see if I can use the
same with zcache and, if so, would be happy to support
the change to swap_writepage.
In your next version, maybe you could break out that chunk
into a separate distinct patch so it can be pulled separately
into Andrew's tree?
> > A second related issue that concerns me is that, although you
> > are now, like zcache2, using an LRU queue for compressed pages
> > (aka "zpages"), there is no relationship between that queue and
> > physical pageframes. In other words, you may free up 100 zpages
> > out of zswap via zswap_flush_entries, but not free up a single
> > pageframe. This seems like a significant design issue. Or am
> > I misunderstanding the code?
>
> You understand correctly. There is room for optimization here and it
> is something I'm working on right now.
>
> What I'm looking to do is give zswap a little insight into zsmalloc
> internals,
Not to be at all snide, but had you been as eager to break
the zsmalloc abstraction last spring, a lot of unpleasantness
and extra work might have been avoided. :v(
> namely the ability figure out what class size a particular
> allocation is in and, in the event the store can't be satisfied, flush
> an entry from that exact class size so that we can be assured the
> store will succeed with minimal flushing work. In this solution,
> there would be an LRU list per zsmalloc class size tracked in zswap.
> The result is LRU-ish flushing overall with class size being the first
> flush selection criteria and LRU as the second.
Clever and definitely useful, though I think there are two related
problems and IIUC this solves only one of them. The problem it _does_
solve is (A) where to put a new zpage: Move a zpage from the same
class to real-swap-disk and then fill its slot with the new zpage.
The problem it _doesn't_ solve is (B) how to shrink the total number
of pageframes used by zswap, even by a single page. I believe
(though cannot prove right now) that this latter problem will
need to be solved to implement any suitable MM policy for balancing
pages-used-for-compression vs pages-not-used-for-compression.
I fear that problem (B) is the fundamental concern with
using a high-density storage allocator such as zsmalloc, which
is why I abandoned zsmalloc in favor of a more-predictable-but-
less-dense allocator (zbud). However, if you have a solution
for (B) as well, I would gladly abandon zbud in zcache (for _both_
cleancache and frontswap pages) and our respective in-kernel
compression efforts would be more easy to merge into one solution
in the future.
> > A third concern is about scalability... the locking seems very
> > coarse-grained. In zcache, you personally observed and fixed
> > hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
> > Doesn't zswap's tree_lock essentially use a single tree (per
> > swaptype), i.e. no scalability?
>
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache. It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.
Hmmm... IIRC, to avoid races in zcache, it was necessary to
update both the data (zpage) and meta-data ("tree" in zswap,
and tmem-data-structure in zcache) atomically. I will need
to study your code more to understand how zswap avoids this
requirement. Or if it is obvious to you, I would be grateful
if you would point it out to me.
Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists