[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51228A09.9030902@linux.vnet.ibm.com>
Date: Mon, 18 Feb 2013 14:07:37 -0600
From: Seth Jennings <sjenning@...ux.vnet.ibm.com>
To: Cody P Schafer <cody@...ux.vnet.ibm.com>
CC: Ric Mason <ric.masonn@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Nitin Gupta <ngupta@...are.org>,
Minchan Kim <minchan@...nel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Robert Jennings <rcj@...ux.vnet.ibm.com>,
Jenifer Hopper <jhopper@...ibm.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <jweiner@...hat.com>,
Rik van Riel <riel@...hat.com>,
Larry Woodman <lwoodman@...hat.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
Joe Perches <joe@...ches.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org
Subject: Re: [PATCHv5 4/8] zswap: add to mm/
On 02/18/2013 01:49 PM, Cody P Schafer wrote:
> On 02/18/2013 11:24 AM, Seth Jennings wrote:
>> On 02/15/2013 10:04 PM, Ric Mason wrote:
>>> On 02/14/2013 02:38 AM, Seth Jennings wrote:
>> <snip>
>>>> +/* invalidates all pages for the given swap type */
>>>> +static void zswap_frontswap_invalidate_area(unsigned type)
>>>> +{
>>>> + struct zswap_tree *tree = zswap_trees[type];
>>>> + struct rb_node *node, *next;
>>>> + struct zswap_entry *entry;
>>>> +
>>>> + if (!tree)
>>>> + return;
>>>> +
>>>> + /* walk the tree and free everything */
>>>> + spin_lock(&tree->lock);
>>>> + node = rb_first(&tree->rbroot);
>>>> + while (node) {
>>>> + entry = rb_entry(node, struct zswap_entry, rbnode);
>>>> + zs_free(tree->pool, entry->handle);
>>>> + next = rb_next(node);
>>>> + zswap_entry_cache_free(entry);
>>>> + node = next;
>>>> + }
>>>> + tree->rbroot = RB_ROOT;
>>>
>>> Why don't need rb_erase for every nodes?
>>
>> We are freeing the entire tree here. try_to_unuse() in the swapoff
>> syscall should have already emptied the tree, but this is here for
>> completeness.
>>
>> rb_erase() will do things like rebalancing the tree; something that
>> just wastes time since we are in the process of freeing the whole
>> tree. We are holding the tree lock here so we are sure that no one
>> else is accessing the tree while it is in this transient broken state.
>
> If we have a sub-tree like:
> ...
> /
> A
> / \
> B C
>
> B == rb_next(tree)
> A == rb_next(B)
> C == rb_next(A)
>
> The current code free's A (via zswap_entry_cache_free()) prior to
> examining C, and thus rb_next(C) results in a use after free of A.
>
> You can solve this by doing a post-order traversal of the tree, either
>
> a) in the destructive manner used in a number of filesystems, see
> fs/ubifs/orphan.c ubifs_add_orphan(), for example.
>
> b) or by doing something similar to this commit:
> https://github.com/jmesmon/linux/commit/d9e43aaf9e8a447d6802531d95a1767532339fad
> , which I've been using for some yet-to-be-merged code.
Great catch! I'll fix this up.
Thanks,
Seth
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists