linux-kernel - RE: [PATCH 7/8] zswap: add to mm/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <0e91c1e5-7a62-4b89-9473-09fff384a334@default>
Date:	Mon, 31 Dec 2012 15:06:09 -0800 (PST)
From:	Dan Magenheimer <dan.magenheimer@...cle.com>
To:	Seth Jennings <sjenning@...ux.vnet.ibm.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Nitin Gupta <ngupta@...are.org>, Minchan Kim <minchan@...nel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Robert Jennings <rcj@...ux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@...ibm.com>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <jweiner@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: RE: [PATCH 7/8] zswap: add to mm/

> From: Seth Jennings [mailto:sjenning@...ux.vnet.ibm.com]
> Subject: [PATCH 7/8] zswap: add to mm/
> 
> zswap is a thin compression backend for frontswap. It receives
> pages from frontswap and attempts to store them in a compressed
> memory pool, resulting in an effective partial memory reclaim and
> dramatically reduced swap device I/O.

Hi Seth --

Happy (almost) New Year!

I am eagerly studying one of the details of your zswap "flush"
code in this patch to see how you solved a problem or two that
I was struggling with for the similar mechanism RFC'ed for zcache
(see https://lkml.org/lkml/2012/10/3/558).  I like the way
that you force the newly-uncompressed to-be-flushed page immediately
into a swap bio in zswap_flush_entry via the call to swap_writepage,
though I'm not entirely convinced that there aren't some race
conditions there.  However, won't swap_writepage simply call
frontswap_store instead and re-compress the page back into zswap?
The (very ugly) solution I used for this was to flag the page in a
frontswap_denial_map (see https://lkml.org/lkml/2012/10/3/560).
Don't you require something like that also, or am I missing some
magic in your code?

I'm also a bit concerned about the consequent recursion:

frontswap_store->
  zswap_fs_store->
    zswap_flush_entries->
      zswap_flush_entry->
        __swap_writepage->
          swap_writepage->
            frontswap_store->
              zswap_fs_store-> etc

It's not obvious to me how deeply this might recurse and/or
how the recursion is terminated.  The RFC'ed zcache code
calls its equivalence of your "flush" code only from the
shrinker thread to avoid this, though there may be a third,
better, way.

A second related issue that concerns me is that, although you
are now, like zcache2, using an LRU queue for compressed pages
(aka "zpages"), there is no relationship between that queue and
physical pageframes.  In other words, you may free up 100 zpages
out of zswap via zswap_flush_entries, but not free up a single
pageframe.  This seems like a significant design issue.  Or am
I misunderstanding the code?

A third concern is about scalability... the locking seems very
coarse-grained.  In zcache, you personally observed and fixed
hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
Doesn't zswap's tree_lock essentially use a single tree (per
swaptype), i.e. no scalability?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/