[<prev] [next>] [day] [month] [year] [list]
Message-ID: <51102BE8.3090301@linux.vnet.ibm.com>
Date: Mon, 04 Feb 2013 15:45:12 -0600
From: Seth Jennings <sjenning@...ux.vnet.ibm.com>
To: Seth Jennings <sjenning@...ux.vnet.ibm.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Nitin Gupta <ngupta@...are.org>,
Minchan Kim <minchan@...nel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Robert Jennings <rcj@...ux.vnet.ibm.com>,
Jenifer Hopper <jhopper@...ibm.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <jweiner@...hat.com>,
Rik van Riel <riel@...hat.com>,
Larry Woodman <lwoodman@...hat.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org,
Joe Perches <joe@...ches.com>
Subject: Re: [PATCHv4 0/7] zswap: compressed swap caching
On 02/04/2013 03:30 PM, Seth Jennings wrote:
> On 01/29/2013 03:40 PM, Seth Jennings wrote:
>> Sorry for the churn but just this set might be easier to review.
>> The code required for the flushing is in a separate patch now
>> as requested.
I've got a large and valuable body of feedback to integrate for v5.
Thanks to all that reviewed/commented!
It will take a little time to compile it all and coordinate with
Minchan and Nitin on the additional documentation and rationale for
zsmalloc.
I just wanted to acknowledge the feedback and state that I'm working
on v5 and I'll get it out as soon as I can.
Thanks,
Seth
>
>>
>> Changelog:
>>
>> v4:
>> * Added Acks (Minchan)
>> * Separated flushing functionality into standalone patch
>> for easier review (Minchan)
>> * fix comment on zswap enabled attribute (Minchan)
>> * add TODO for dynamic mempool size (Minchan)
>> * and check for NULL in zswap_free_page() (Minchan)
>> * add missing zs_free() in error path (Minchan)
>> * TODO: add comments for flushing/refcounting (Minchan)
>>
>> NOTE: To build, read this:
>> http://lkml.org/lkml/2013/1/28/586
>>
>> v3:
>> * Dropped the zsmalloc patches from the set, except the promotion patch
>> which has be converted to a rename patch (vs full diff). The dropped
>> patches have been Acked and are going into Greg's staging tree soon.
>> * Separated [PATCHv2 7/9] into two patches since it makes changes for two
>> different reasons (Minchan)
>> * Moved ZSWAP_MAX_OUTSTANDING_FLUSHES near the top in zswap.c (Rik)
>> * Rebase to v3.8-rc5. linux-next is a little volatile with the
>> swapper_space per type changes which will effect this patchset.
>> * TODO: Move some stats from debugfs to sysfs. Which ones? (Rik)
>>
>> v2:
>> * Rename zswap_fs_* functions to zswap_frontswap_* to avoid
>> confusion with "filesystem"
>> * Add comment about what the tree lock protects
>> * Remove "#if 0" code (should have been done before)
>> * Break out changes to existing swap code into separate patch
>> * Fix blank line EOF warning on documentation file
>> * Rebase to next-20130107
>>
>> Zswap Overview:
>>
>> Zswap is a lightweight compressed cache for swap pages. It takes
>> pages that are in the process of being swapped out and attempts to
>> compress them into a dynamically allocated RAM-based memory pool.
>> If this process is successful, the writeback to the swap device is
>> deferred and, in many cases, avoided completely. This results in
>> a significant I/O reduction and performance gains for systems that
>> are swapping.
>>
>> The results of a kernel building benchmark indicate a
>> runtime reduction of 53% and an I/O reduction 76% with zswap vs normal
>> swapping with a kernel build under heavy memory pressure (see
>> Performance section for more).
>>
>> Some addition performance metrics regarding the performance
>> improvements and I/O reductions that can be achieved using zswap as
>> measured by SPECjbb are provided here:
>>
>> http://ibm.co/VCgHvM
>>
>> These results include runs on x86 and new results on Power7+ with
>> hardware compression acceleration.
>>
>> Of particular note is that zswap is able to evict pages from the compressed
>> cache, on an LRU basis, to the backing swap device when the compressed pool
>> reaches it size limit or the pool is unable to obtain additional pages
>> from the buddy allocator. This eviction functionality had been identified
>> as a requirement in prior community discussions.
>>
>> Patchset Structure:
>> 1: add atomic_t get/set to debugfs
>> 2: promote zsmalloc to /lib
>> 3,4: changes to existing swap code for zswap
>> 5,6: add zswap and documentation
>>
>> Rationale:
>>
>> Zswap provides compressed swap caching that basically trades CPU cycles
>> for reduced swap I/O. This trade-off can result in a significant
>> performance improvement as reads to/writes from to the compressed
>> cache almost always faster that reading from a swap device
>> which incurs the latency of an asynchronous block I/O read.
>>
>> Some potential benefits:
>> * Desktop/laptop users with limited RAM capacities can mitigate the
>> performance impact of swapping.
>> * Overcommitted guests that share a common I/O resource can
>> dramatically reduce their swap I/O pressure, avoiding heavy
>> handed I/O throttling by the hypervisor. This allows more work
>> to get done with less impact to the guest workload and guests
>> sharing the I/O subsystem
>> * Users with SSDs as swap devices can extend the life of the device by
>> drastically reducing life-shortening writes.
>>
>> Compressed swap is also provided in zcache, along with page cache
>> compression and RAM clustering through RAMSter. Zswap seeks to deliver
>> the benefit of swap compression to users in a discrete function.
>> This design decision is akin to Unix design philosophy of doing one
>> thing well, it leaves file cache compression and other features
>> for separate code.
>>
>> Design:
>>
>> Zswap receives pages for compression through the Frontswap API and
>> is able to evict pages from its own compressed pool on an LRU basis
>> and write them back to the backing swap device in the case that the
>> compressed pool is full or unable to secure additional pages from
>> the buddy allocator.
>>
>> Zswap makes use of zsmalloc for the managing the compressed memory
>> pool. This is because zsmalloc is specifically designed to minimize
>> fragmentation on large (> PAGE_SIZE/2) allocation sizes. Each
>> allocation in zsmalloc is not directly accessible by address.
>> Rather, a handle is return by the allocation routine and that handle
>> must be mapped before being accessed. The compressed memory pool grows
>> on demand and shrinks as compressed pages are freed. The pool is
>> not preallocated.
>>
>> When a swap page is passed from frontswap to zswap, zswap maintains
>> a mapping of the swap entry, a combination of the swap type and swap
>> offset, to the zsmalloc handle that references that compressed swap
>> page. This mapping is achieved with a red-black tree per swap type.
>> The swap offset is the search key for the tree nodes.
>>
>> Zswap seeks to be simple in its policies. Sysfs attributes allow for
>> two user controlled policies:
>> * max_compression_ratio - Maximum compression ratio, as as percentage,
>> for an acceptable compressed page. Any page that does not compress
>> by at least this ratio will be rejected.
>> * max_pool_percent - The maximum percentage of memory that the compressed
>> pool can occupy.
>>
>> To enabled zswap, the "enabled" attribute must be set to 1 at boot time.
>>
>> Zswap allows the compressor to be selected at kernel boot time by
>> setting the “compressor” attribute. The default compressor is lzo.
>>
>> A debugfs interface is provided for various statistic about pool size,
>> number of pages stored, and various counters for the reasons pages
>> are rejected.
>>
>> Performance, Kernel Building:
>>
>> Setup
>> ========
>> Gentoo w/ kernel v3.7-rc7
>> Quad-core i5-2500 @ 3.3GHz
>> 512MB DDR3 1600MHz (limited with mem=512m on boot)
>> Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
>> majflt are major page faults reported by the time command
>> pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
>> the make -jN
>>
>> Summary
>> ========
>> * Zswap reduces I/O and improves performance at all swap pressure levels.
>>
>> * Under heavy swaping at 24 threads, zswap reduced I/O by 76%, saving
>> over 1.5GB of I/O, and cut runtime in half.
>>
>> Details
>> ========
>> I/O (in pages)
>> base zswap change change
>> N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O MB
>> 8 1 335 291 627 0 0 249 249 -60% 1
>> 12 3688 14315 5290 23293 123 860 5954 6937 -70% 64
>> 16 12711 46179 16803 75693 2936 7390 46092 56418 -25% 75
>> 20 42178 133781 49898 225857 9460 28382 92951 130793 -42% 371
>> 24 96079 357280 105242 558601 7719 18484 109309 135512 -76% 1653
>>
>> Runtime (in seconds)
>> N base zswap %change
>> 8 107 107 0%
>> 12 128 110 -14%
>> 16 191 179 -6%
>> 20 371 240 -35%
>> 24 570 267 -53%
>>
>> %CPU utilization (out of 400% on 4 cpus)
>> N base zswap %change
>> 8 317 319 1%
>> 12 267 311 16%
>> 16 179 191 7%
>> 20 94 143 52%
>> 24 60 128 113%
>>
>>
>> Seth Jennings (7):
>> debugfs: add get/set for atomic types
>> zsmalloc: promote to lib/
>> zswap: add to mm/
>> mm: break up swap_writepage() for frontswap backends
>> mm: allow for outstanding swap writeback accounting
>> zswap: add flushing support
>> zswap: add documentation
>>
>> Documentation/vm/zswap.txt | 73 ++
>> drivers/staging/Kconfig | 2 -
>> drivers/staging/Makefile | 1 -
>> drivers/staging/zcache/zcache-main.c | 3 +-
>> drivers/staging/zram/zram_drv.h | 3 +-
>> drivers/staging/zsmalloc/Kconfig | 10 -
>> drivers/staging/zsmalloc/Makefile | 3 -
>> fs/debugfs/file.c | 42 +
>> include/linux/debugfs.h | 2 +
>> include/linux/swap.h | 4 +
>> .../staging/zsmalloc => include/linux}/zsmalloc.h | 0
>> lib/Kconfig | 18 +
>> lib/Makefile | 1 +
>> .../zsmalloc/zsmalloc-main.c => lib/zsmalloc.c | 3 +-
>> mm/Kconfig | 15 +
>> mm/Makefile | 1 +
>> mm/page_io.c | 22 +-
>> mm/swap_state.c | 2 +-
>> mm/zswap.c | 1073 ++++++++++++++++++++
>> 19 files changed, 1250 insertions(+), 28 deletions(-)
>> create mode 100644 Documentation/vm/zswap.txt
>> delete mode 100644 drivers/staging/zsmalloc/Kconfig
>> delete mode 100644 drivers/staging/zsmalloc/Makefile
>> rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
>> rename drivers/staging/zsmalloc/zsmalloc-main.c => lib/zsmalloc.c (99%)
>> create mode 100644 mm/zswap.c
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists