lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 May 2013 09:10:20 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Nitin Gupta <ngupta@...are.org>,
	Minchan Kim <minchan@...nel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Robert Jennings <rcj@...ux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@...ibm.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Dave Hansen <dave@...1.net>, Joe Perches <joe@...ches.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Cody P Schafer <cody@...ux.vnet.ibm.com>,
	Hugh Dickens <hughd@...gle.com>,
	Paul Mackerras <paulus@...ba.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: Re: [PATCHv11 2/4] zbud: add to mm/

On Mon, May 20, 2013 at 10:42:25AM -0500, Seth Jennings wrote:
> On Mon, May 20, 2013 at 02:54:39PM +0100, Mel Gorman wrote:
> > On Sun, May 19, 2013 at 03:52:19PM -0500, Seth Jennings wrote:
> > > My first guess is that the external fragmentation situation you are referring to
> > > is a workload in which all pages compress to greater than half a page.  If so,
> > > then it doesn't matter what NCHUCNKS_ORDER is, there won't be any pages the
> > > compress enough to fit in the < PAGE_SIZE/2 free space that remains in the
> > > unbuddied zbud pages.
> > > 
> > 
> > There are numerous aspects to this, too many to write them all down.
> > Modelling the external fragmentation one and how it affects swap IO
> > would be a complete pain in the ass so lets consider the following
> > example instead as it's a bit clearer.
> > 
> > Three processes. Process A compresses by 75%, Process B compresses to 15%,
> > Process C pages compress to 15%. They are all adding to zswap in lockstep.
> > Lets say that zswap can hold 100 physical pages.
> > 
> > NCHUNKS == 2
> > 	All Process A pages get rejected.
> 
> Ah, I think this is our disconnect.  Process A pages will not be rejected.
> They will be stored in a zbud page, and that zbud page will be added
> to the 0th unbuddied list.  This list maintains a list of zbud pages
> that will never be buddied because there are no free chunks.
> 

D'oh, good point. Unfortunately, the problem then still exists at the
writeback end which I didn't bring up in the previous mail. Take three
processes writing in lockstep. Process A pages compress to 15%, B compresses
to 15%, C compresses to 60%. Each physical page packing will look like this

nchunks=6	nchunks = 2

Page 0	A B	A B
Page 1	C A	C
Page 2	B C	A B
Page 3	A B	C
Pattern repeats..........

This continues until zswap is full. Now all three process stop and process
D starts writing to zswap, each of its pages compresses to 80%. These will
be freed in LRU order which is effectively FIFO.

With nchunks=6, to store 2 process D pages, it must write out 4 pages
to swap.  With nchunks=2, to store two process D pages, it must write 3
pages so process D stalls for less time with nchunks==2.

This is a variation of the zsmalloc packing problem where greater packing
leads to worse performance when zswap is full. The user bug report will
look something like "performance goes to hell when zswap is full although
swap IO rates look normal". If it was a kernel parameter, setting nchunk=2
as a kernel boot parameter will at least be a workaround.

Of course, this is all hypothetical. It is certainly possible to create
a reference string where nchunks=2 generates more IO but it tends to
generate the IO sooner and more closely resemble existing swap behaviour
that users might be willing to accept as a workaround until their problem
can be resolved. This is why I think the parameter should be available at
boot-time as a debugging/workaround option for developers to recommend to
a user.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ