linux-kernel - [PATCH 0/4] zsmalloc improvements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1341263752-10210-1-git-send-email-sjenning@linux.vnet.ibm.com>
Date:	Mon,  2 Jul 2012 16:15:48 -0500
From:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
To:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:	Seth Jennings <sjenning@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Nitin Gupta <ngupta@...are.org>,
	Minchan Kim <minchan@...nel.org>,
	Robert Jennings <rcj@...ux.vnet.ibm.com>, linux-mm@...ck.org,
	devel@...verdev.osuosl.org, linux-kernel@...r.kernel.org
Subject: [PATCH 0/4] zsmalloc improvements

This patchset removes the current x86 dependency for zsmalloc
and introduces some performance improvements in the object
mapping paths.

It was meant to be a follow-on to my previous patchest

https://lkml.org/lkml/2012/6/26/540

However, this patchset differed so much in light of new performance
information that I mostly started over.

In the past, I attempted to compare different mapping methods
via the use of zcache and frontswap.  However, the nature of those
two features makes comparing mapping method efficiency difficult
since the mapping is a very small part of the overall code path.

In an effort to get more useful statistics on the mapping speed,
I wrote a microbenchmark module named zsmapbench, designed to
measure mapping speed by calling straight into the zsmalloc
paths.

https://github.com/spartacus06/zsmapbench

This exposed an interesting and unexpected result: in all
cases that I tried, copying the objects that span pages instead
of using the page table to map them, was _always_ faster.  I could
not find a case in which the page table mapping method was faster.

zsmapbench measures the copy-based mapping at ~560 cycles for a
map/unmap operation on spanned object for both KVM guest and bare-metal,
while the page table mapping was ~1500 cycles on a VM and ~760 cycles
bare-metal.  The cycles for the copy method will vary with
allocation size, however, it is still faster even for the largest
allocation that zsmalloc supports.

The result is convenient though, as mempcy is very portable :)

This patchset replaces the x86-only page table mapping code with
copy-based mapping code. It also makes changes to optimize this
new method further.

There are no changes in arch/x86 required.

Patchset is based on greg's staging-next.

Seth Jennings (4):
  zsmalloc: remove x86 dependency
  zsmalloc: add single-page object fastpath in unmap
  zsmalloc: add details to zs_map_object boiler plate
  zsmalloc: add mapping modes

 drivers/staging/zcache/zcache-main.c     |    6 +-
 drivers/staging/zram/zram_drv.c          |    7 +-
 drivers/staging/zsmalloc/Kconfig         |    4 -
 drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
 drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
 6 files changed, 114 insertions(+), 47 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/