[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1359682784.3574.2.camel@kernel>
Date: Thu, 31 Jan 2013 19:39:44 -0600
From: Simon Jeons <simon.jeons@...il.com>
To: Seth Jennings <sjenning@...ux.vnet.ibm.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Nitin Gupta <ngupta@...are.org>,
Minchan Kim <minchan@...nel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Robert Jennings <rcj@...ux.vnet.ibm.com>,
Jenifer Hopper <jhopper@...ibm.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <jweiner@...hat.com>,
Rik van Riel <riel@...hat.com>,
Larry Woodman <lwoodman@...hat.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: Re: [PATCHv4 0/7] zswap: compressed swap caching
Hi Seth,
On Tue, 2013-01-29 at 15:40 -0600, Seth Jennings wrote:
> Sorry for the churn but just this set might be easier to review.
> The code required for the flushing is in a separate patch now
> as requested.
>
> Changelog:
>
> v4:
> * Added Acks (Minchan)
> * Separated flushing functionality into standalone patch
> for easier review (Minchan)
> * fix comment on zswap enabled attribute (Minchan)
> * add TODO for dynamic mempool size (Minchan)
> * and check for NULL in zswap_free_page() (Minchan)
> * add missing zs_free() in error path (Minchan)
> * TODO: add comments for flushing/refcounting (Minchan)
>
> NOTE: To build, read this:
> http://lkml.org/lkml/2013/1/28/586
>
> v3:
> * Dropped the zsmalloc patches from the set, except the promotion patch
> which has be converted to a rename patch (vs full diff). The dropped
> patches have been Acked and are going into Greg's staging tree soon.
> * Separated [PATCHv2 7/9] into two patches since it makes changes for two
> different reasons (Minchan)
> * Moved ZSWAP_MAX_OUTSTANDING_FLUSHES near the top in zswap.c (Rik)
> * Rebase to v3.8-rc5. linux-next is a little volatile with the
> swapper_space per type changes which will effect this patchset.
> * TODO: Move some stats from debugfs to sysfs. Which ones? (Rik)
>
> v2:
> * Rename zswap_fs_* functions to zswap_frontswap_* to avoid
> confusion with "filesystem"
> * Add comment about what the tree lock protects
> * Remove "#if 0" code (should have been done before)
> * Break out changes to existing swap code into separate patch
> * Fix blank line EOF warning on documentation file
> * Rebase to next-20130107
>
> Zswap Overview:
>
> Zswap is a lightweight compressed cache for swap pages. It takes
> pages that are in the process of being swapped out and attempts to
> compress them into a dynamically allocated RAM-based memory pool.
> If this process is successful, the writeback to the swap device is
> deferred and, in many cases, avoided completely. This results in
> a significant I/O reduction and performance gains for systems that
> are swapping.
>
> The results of a kernel building benchmark indicate a
> runtime reduction of 53% and an I/O reduction 76% with zswap vs normal
> swapping with a kernel build under heavy memory pressure (see
> Performance section for more).
>
> Some addition performance metrics regarding the performance
> improvements and I/O reductions that can be achieved using zswap as
> measured by SPECjbb are provided here:
>
> http://ibm.co/VCgHvM
>
> These results include runs on x86 and new results on Power7+ with
> hardware compression acceleration.
>
> Of particular note is that zswap is able to evict pages from the compressed
> cache, on an LRU basis, to the backing swap device when the compressed pool
> reaches it size limit or the pool is unable to obtain additional pages
> from the buddy allocator. This eviction functionality had been identified
> as a requirement in prior community discussions.
>
> Patchset Structure:
> 1: add atomic_t get/set to debugfs
> 2: promote zsmalloc to /lib
> 3,4: changes to existing swap code for zswap
> 5,6: add zswap and documentation
>
> Rationale:
>
> Zswap provides compressed swap caching that basically trades CPU cycles
> for reduced swap I/O. This trade-off can result in a significant
> performance improvement as reads to/writes from to the compressed
> cache almost always faster that reading from a swap device
> which incurs the latency of an asynchronous block I/O read.
>
> Some potential benefits:
> * Desktop/laptop users with limited RAM capacities can mitigate the
> performance impact of swapping.
> * Overcommitted guests that share a common I/O resource can
> dramatically reduce their swap I/O pressure, avoiding heavy
> handed I/O throttling by the hypervisor. This allows more work
> to get done with less impact to the guest workload and guests
> sharing the I/O subsystem
> * Users with SSDs as swap devices can extend the life of the device by
> drastically reducing life-shortening writes.
>
> Compressed swap is also provided in zcache, along with page cache
> compression and RAM clustering through RAMSter. Zswap seeks to deliver
> the benefit of swap compression to users in a discrete function.
> This design decision is akin to Unix design philosophy of doing one
> thing well, it leaves file cache compression and other features
> for separate code.
>
> Design:
>
> Zswap receives pages for compression through the Frontswap API and
> is able to evict pages from its own compressed pool on an LRU basis
> and write them back to the backing swap device in the case that the
> compressed pool is full or unable to secure additional pages from
> the buddy allocator.
>
> Zswap makes use of zsmalloc for the managing the compressed memory
> pool. This is because zsmalloc is specifically designed to minimize
> fragmentation on large (> PAGE_SIZE/2) allocation sizes. Each
> allocation in zsmalloc is not directly accessible by address.
> Rather, a handle is return by the allocation routine and that handle
> must be mapped before being accessed. The compressed memory pool grows
> on demand and shrinks as compressed pages are freed. The pool is
> not preallocated.
>
> When a swap page is passed from frontswap to zswap, zswap maintains
> a mapping of the swap entry, a combination of the swap type and swap
> offset, to the zsmalloc handle that references that compressed swap
> page. This mapping is achieved with a red-black tree per swap type.
> The swap offset is the search key for the tree nodes.
>
> Zswap seeks to be simple in its policies. Sysfs attributes allow for
> two user controlled policies:
> * max_compression_ratio - Maximum compression ratio, as as percentage,
> for an acceptable compressed page. Any page that does not compress
> by at least this ratio will be rejected.
> * max_pool_percent - The maximum percentage of memory that the compressed
> pool can occupy.
>
> To enabled zswap, the "enabled" attribute must be set to 1 at boot time.
>
> Zswap allows the compressor to be selected at kernel boot time by
> setting the “compressor” attribute. The default compressor is lzo.
>
> A debugfs interface is provided for various statistic about pool size,
> number of pages stored, and various counters for the reasons pages
> are rejected.
>
> Performance, Kernel Building:
>
> Setup
> ========
> Gentoo w/ kernel v3.7-rc7
> Quad-core i5-2500 @ 3.3GHz
> 512MB DDR3 1600MHz (limited with mem=512m on boot)
> Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
> majflt are major page faults reported by the time command
> pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
> the make -jN
>
> Summary
> ========
> * Zswap reduces I/O and improves performance at all swap pressure levels.
>
> * Under heavy swaping at 24 threads, zswap reduced I/O by 76%, saving
> over 1.5GB of I/O, and cut runtime in half.
How to get your benchmark?
>
> Details
> ========
> I/O (in pages)
> base zswap change change
> N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O MB
> 8 1 335 291 627 0 0 249 249 -60% 1
> 12 3688 14315 5290 23293 123 860 5954 6937 -70% 64
> 16 12711 46179 16803 75693 2936 7390 46092 56418 -25% 75
> 20 42178 133781 49898 225857 9460 28382 92951 130793 -42% 371
> 24 96079 357280 105242 558601 7719 18484 109309 135512 -76% 1653
>
> Runtime (in seconds)
> N base zswap %change
> 8 107 107 0%
> 12 128 110 -14%
> 16 191 179 -6%
> 20 371 240 -35%
> 24 570 267 -53%
>
> %CPU utilization (out of 400% on 4 cpus)
> N base zswap %change
> 8 317 319 1%
> 12 267 311 16%
> 16 179 191 7%
> 20 94 143 52%
> 24 60 128 113%
>
>
> Seth Jennings (7):
> debugfs: add get/set for atomic types
> zsmalloc: promote to lib/
> zswap: add to mm/
> mm: break up swap_writepage() for frontswap backends
> mm: allow for outstanding swap writeback accounting
> zswap: add flushing support
> zswap: add documentation
>
> Documentation/vm/zswap.txt | 73 ++
> drivers/staging/Kconfig | 2 -
> drivers/staging/Makefile | 1 -
> drivers/staging/zcache/zcache-main.c | 3 +-
> drivers/staging/zram/zram_drv.h | 3 +-
> drivers/staging/zsmalloc/Kconfig | 10 -
> drivers/staging/zsmalloc/Makefile | 3 -
> fs/debugfs/file.c | 42 +
> include/linux/debugfs.h | 2 +
> include/linux/swap.h | 4 +
> .../staging/zsmalloc => include/linux}/zsmalloc.h | 0
> lib/Kconfig | 18 +
> lib/Makefile | 1 +
> .../zsmalloc/zsmalloc-main.c => lib/zsmalloc.c | 3 +-
> mm/Kconfig | 15 +
> mm/Makefile | 1 +
> mm/page_io.c | 22 +-
> mm/swap_state.c | 2 +-
> mm/zswap.c | 1073 ++++++++++++++++++++
> 19 files changed, 1250 insertions(+), 28 deletions(-)
> create mode 100644 Documentation/vm/zswap.txt
> delete mode 100644 drivers/staging/zsmalloc/Kconfig
> delete mode 100644 drivers/staging/zsmalloc/Makefile
> rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
> rename drivers/staging/zsmalloc/zsmalloc-main.c => lib/zsmalloc.c (99%)
> create mode 100644 mm/zswap.c
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists