linux-kernel - Re: [PATCHv2 8/9] zswap: add to mm/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <510698F5.5060205@linux.vnet.ibm.com>
Date:	Mon, 28 Jan 2013 09:27:49 -0600
From:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
To:	Rik van Riel <riel@...hat.com>
CC:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nitin Gupta <ngupta@...are.org>,
	Minchan Kim <minchan@...nel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Robert Jennings <rcj@...ux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@...ibm.com>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: Re: [PATCHv2 8/9] zswap: add to mm/

On 01/25/2013 04:44 PM, Rik van Riel wrote:
> On 01/07/2013 03:24 PM, Seth Jennings wrote:
>> zswap is a thin compression backend for frontswap. It receives
>> pages from frontswap and attempts to store them in a compressed
>> memory pool, resulting in an effective partial memory reclaim and
>> dramatically reduced swap device I/O.
>>
>> Additional, in most cases, pages can be retrieved from this
>> compressed store much more quickly than reading from tradition
>> swap devices resulting in faster performance for many workloads.
>>
>> This patch adds the zswap driver to mm/
>>
>> Signed-off-by: Seth Jennings <sjenning@...ux.vnet.ibm.com>
> 
> I like the approach of flushing pages into actual disk based
> swap when compressed swap is full.  I would like it if that
> was advertised more prominently in the changelog :)

Thanks so much for the review!

> The code looks mostly good, complaints are at the nitpick level.
> 
> One worry is that the pool can grow to whatever maximum was
> decided, and there is no way to shrink it when memory is
> required for something else.
> 
> Would it be an idea to add a shrinker for the zcache pool,
> that can also shrink the zcache pool when required?
> 
> Of course, that does lead to the question of how to balance
> the pressure from that shrinker, with the new memory entering
> zcache from the swap side. I have no clear answers here, just
> something to think about...

Yes, I prototyped a shrinker interface for zswap, but, as we both
figured, it shrinks the zswap compressed pool too aggressively to the
point of being useless.

Right now I'm working on a zswap thread that will "leak" pages out to
the swap device on an LRU basis over time.  That way if the page is a
rarely accessed page, it will eventually be written out to the swap
device and it's memory freed, even if the zswap pool isn't full.

Would this address your concerns?

>> +static void zswap_flush_entries(unsigned type, int nr)
>> +{
>> +    struct zswap_tree *tree = zswap_trees[type];
>> +    struct zswap_entry *entry;
>> +    int i, ret;
>> +
>> +/*
>> + * This limits is arbitrary for now until a better
>> + * policy can be implemented. This is so we don't
>> + * eat all of RAM decompressing pages for writeback.
>> + */
>> +#define ZSWAP_MAX_OUTSTANDING_FLUSHES 64
>> +    if (atomic_read(&zswap_outstanding_flushes) >
>> +        ZSWAP_MAX_OUTSTANDING_FLUSHES)
>> +        return;
> 
> Having this #define right in the middle of the function is
> rather ugly.  Might be worth moving it to the top.

Yes. In my mind, this policy was going to be replaced by a better one
soon. Checking may_write_to_queue() was my idea.  I didn't spend too
much time making that part pretty.

>> +static int __init zswap_debugfs_init(void)
>> +{
>> +    if (!debugfs_initialized())
>> +        return -ENODEV;
>> +
>> +    zswap_debugfs_root = debugfs_create_dir("zswap", NULL);
>> +    if (!zswap_debugfs_root)
>> +        return -ENOMEM;
>> +
>> +    debugfs_create_u64("saved_by_flush", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_saved_by_flush);
>> +    debugfs_create_u64("pool_limit_hit", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_pool_limit_hit);
>> +    debugfs_create_u64("reject_flush_attempted", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_flush_attempted);
>> +    debugfs_create_u64("reject_tmppage_fail", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_reject_tmppage_fail);
>> +    debugfs_create_u64("reject_flush_fail", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_reject_flush_fail);
>> +    debugfs_create_u64("reject_zsmalloc_fail", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_reject_zsmalloc_fail);
>> +    debugfs_create_u64("reject_kmemcache_fail", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_reject_kmemcache_fail);
>> +    debugfs_create_u64("reject_compress_poor", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_reject_compress_poor);
>> +    debugfs_create_u64("flushed_pages", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_flushed_pages);
>> +    debugfs_create_u64("duplicate_entry", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_duplicate_entry);
>> +    debugfs_create_atomic_t("pool_pages", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_pool_pages);
>> +    debugfs_create_atomic_t("stored_pages", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_stored_pages);
>> +    debugfs_create_atomic_t("outstanding_flushes", S_IRUGO,
>> +            zswap_debugfs_root, &zswap_outstanding_flushes);
>> +
> 
> Some of these statistics would be very useful to system
> administrators, who will not be mounting debugfs on
> production systems.
> 
> Would it make sense to export some of these statistics
> through sysfs?

That's fine.  Which of these stats do you think should be in sysfs?

Thanks again for taking time to look at this!

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/