[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FFD2524.2050300@kernel.org>
Date: Wed, 11 Jul 2012 16:03:00 +0900
From: Minchan Kim <minchan@...nel.org>
To: Seth Jennings <sjenning@...ux.vnet.ibm.com>
CC: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Nitin Gupta <ngupta@...are.org>,
Robert Jennings <rcj@...ux.vnet.ibm.com>, linux-mm@...ck.org,
devel@...verdev.osuosl.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] zsmalloc improvements
Hi everybody,
I realized it by Seth's mention yesterday that Greg already merged this series
I should have hurried but last week I have no time. :(
On 07/03/2012 06:15 AM, Seth Jennings wrote:
> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
>
> It was meant to be a follow-on to my previous patchest
>
> https://lkml.org/lkml/2012/6/26/540
>
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
>
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap. However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
>
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
>
> https://github.com/spartacus06/zsmapbench
>
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster. I could
> not find a case in which the page table mapping method was faster.
>
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal. The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
>
> The result is convenient though, as mempcy is very portable :)
Today, I tested zsmapbench in my embedded board(ARM).
tlb-flush is 30% faster than copy-based so it's always not win.
I think it depends on CPU speed/cache size.
zram is already very popular on embedded systems so I want to use
it continuously without 30% big demage so I want to keep our old approach
which supporting local tlb flush.
Of course, in case of KVM guest, copy-based would be always bin win.
So shouldn't we support both approach? It could make code very ugly
but I think it has enough value.
Any thought?
>
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
>
> There are no changes in arch/x86 required.
>
> Patchset is based on greg's staging-next.
>
> Seth Jennings (4):
> zsmalloc: remove x86 dependency
> zsmalloc: add single-page object fastpath in unmap
> zsmalloc: add details to zs_map_object boiler plate
> zsmalloc: add mapping modes
>
> drivers/staging/zcache/zcache-main.c | 6 +-
> drivers/staging/zram/zram_drv.c | 7 +-
> drivers/staging/zsmalloc/Kconfig | 4 -
> drivers/staging/zsmalloc/zsmalloc-main.c | 124 ++++++++++++++++++++++--------
> drivers/staging/zsmalloc/zsmalloc.h | 14 +++-
> drivers/staging/zsmalloc/zsmalloc_int.h | 6 +-
> 6 files changed, 114 insertions(+), 47 deletions(-)
>
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists