[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPQyPG7MQPmekvXZdMTQ9Z=4HqPfG3xQbjstO2e_hatSYSyV4w@mail.gmail.com>
Date: Mon, 19 Dec 2011 17:37:14 +0800
From: Nai Xia <nai.xia@...il.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
Dave Chinner <david@...morbit.com>,
Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH] radix_tree: take radix_tree_path off stack
On Mon, Dec 19, 2011 at 4:20 PM, nai.xia <nai.xia@...il.com> wrote:
>
>
> On 2011年12月19日 14:41, Hugh Dickins wrote:
>>
>> Down, down in the deepest depths of GFP_NOIO page reclaim, we have
>> shrink_page_list() calling __remove_mapping() calling __delete_from_
>> swap_cache() or __delete_from_page_cache().
>>
>> You would not expect those to need much stack, but in fact they call
>> radix_tree_delete(): which declares a 192-byte radix_tree_path array
>> on its stack (to record the node,offsets it visits when descending,
>> in case it needs to ascend to update them). And if any tag is still
>> set [1], that calls radix_tree_tag_clear(), which declares a further
>> such 192-byte radix_tree_path array on the stack. (At least we have
>> interrupts disabled here, so won't then be pushing registers too.)
>>
>> That was probably a good choice when most users were 32-bit (array
>> of half the size), and adding fields to radix_tree_node would have
>> bloated it unnecessarily. But nowadays many are 64-bit, and each
>> radix_tree_node contains a struct rcu_head, which is only used when
>
>
> Can rcu_head in someway unionized with radix_tree_node->height
> and radix_tree_node->count? count is always referenced under lock
> and only the first node's height is referenced during lookup.
> Seems like if we atomically set root->rnode to NULL, before
> freeing the last node, we can ensure a valid read of the
> radix_tree_node->height when lookup by following it with
> a root->rnode == NULL test.
>
> I am not very sure of course, just a naive feeling.
And besides, I think maybe there were another few ways if
we really care about the stack usage of radix_tree_path,
e.g.
1. We can make radix_tree_path.offset compact to u8
which is enough to index inside a node.
2. We can use dynamic array on stack instead of
RADIX_TREE_MAX_PATH, I think for most cases
this may save half of the space
3. Take benefit of radix_tree_path array already
traveled down to clear the tags instead of calling
a radix_tree_tag_clear with full array.
I am not speaking of the negatives of your patch
, just some alternatives for your reference.
And forgive my possible selfishness, I just created a home
made radix tree traveler based on radix_tree_path array to
simulate recursive calls, not ready to its vanishing...
Thanks,
Nai
>
>
>> freeing; whereas the radix_tree_path info is only used for updating
>> the tree (deleting, clearing tags or setting tags if tagged) when a
>> lock must be held, of no interest when accessing the tree locklessly.
>>
>> So add a parent pointer to the radix_tree_node, in union with the
>> rcu_head, and remove all uses of the radix_tree_path. There would
>> be space in that union to save the offset when descending as before
>> (we can argue that a lock must already be held to exclude other users),
>> but recalculating it when ascending is both easy (a constant shift and
>> a constant mask) and uncommon, so it seems better just to do that.
>>
>> Two little optimizations: no need to decrement height when descending,
>> adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
>> tag on a node and its ancestors, it need not ascend from that node again.
>>
>> perf on the radix tree test harness reports radix_tree_insert() as 2%
>> slower (now having to set parent), but radix_tree_delete() 24% faster.
>> Surely that's an exaggeration from rtth's artificially low map shift 3,
>> but forcing it back to 6 still rates radix_tree_delete() 8% faster.
>>
>> [1] Can a pagecache tag (dirty, writeback or towrite) actually still be
>> set at the time of radix_tree_delete()? Perhaps not if the filesystem
>> is well-behaved. But although I've not tracked any stack overflow down
>> to this cause, I have observed a curious case in which a dirty tag is
>> set and left set on tmpfs: page migration's migrate_page_copy() happens
>> to use __set_page_dirty_nobuffers() to set PageDirty on the newpage,
>> and that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
>> filesystem which doesn't use tags, except for this stack depth issue.
>>
>> Signed-off-by: Hugh Dickins<hughd@...gle.com>
>> ----
>>
>> lib/radix-tree.c | 148 +++++++++++++++++++++------------------------
>> 1 file changed, 70 insertions(+), 78 deletions(-)
>>
>> --- 3.2-rc6/lib/radix-tree.c 2011-11-07 19:24:53.342418579 -0800
>> +++ linux/lib/radix-tree.c 2011-12-16 20:40:26.152758485 -0800
>> @@ -48,16 +48,14 @@
>> struct radix_tree_node {
>> unsigned int height; /* Height from the bottom */
>> unsigned int count;
>> - struct rcu_head rcu_head;
>> + union {
>> + struct radix_tree_node *parent; /* Used when ascending
>> tree */
>> + struct rcu_head rcu_head; /* Used when freeing node
>> */
>> + };
>> void __rcu *slots[RADIX_TREE_MAP_SIZE];
>> unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
>> };
>>
>> -struct radix_tree_path {
>> - struct radix_tree_node *node;
>> - int offset;
>> -};
>> -
>> #define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long))
>> #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
>> RADIX_TREE_MAP_SHIFT))
>> @@ -256,6 +254,7 @@ static inline unsigned long radix_tree_m
>> static int radix_tree_extend(struct radix_tree_root *root, unsigned long
>> index)
>> {
>> struct radix_tree_node *node;
>> + struct radix_tree_node *slot;
>> unsigned int height;
>> int tag;
>>
>> @@ -274,18 +273,23 @@ static int radix_tree_extend(struct radi
>> if (!(node = radix_tree_node_alloc(root)))
>> return -ENOMEM;
>>
>> - /* Increase the height. */
>> - node->slots[0] = indirect_to_ptr(root->rnode);
>> -
>> /* Propagate the aggregated tag info into the new root */
>> for (tag = 0; tag< RADIX_TREE_MAX_TAGS; tag++) {
>> if (root_tag_get(root, tag))
>> tag_set(node, tag, 0);
>> }
>>
>> + /* Increase the height. */
>> newheight = root->height+1;
>> node->height = newheight;
>> node->count = 1;
>> + node->parent = NULL;
>> + slot = root->rnode;
>> + if (newheight> 1) {
>> + slot = indirect_to_ptr(slot);
>> + slot->parent = node;
>> + }
>> + node->slots[0] = slot;
>> node = ptr_to_indirect(node);
>> rcu_assign_pointer(root->rnode, node);
>> root->height = newheight;
>> @@ -331,6 +335,7 @@ int radix_tree_insert(struct radix_tree_
>> if (!(slot = radix_tree_node_alloc(root)))
>> return -ENOMEM;
>> slot->height = height;
>> + slot->parent = node;
>> if (node) {
>> rcu_assign_pointer(node->slots[offset],
>> slot);
>> node->count++;
>> @@ -504,47 +509,41 @@ EXPORT_SYMBOL(radix_tree_tag_set);
>> void *radix_tree_tag_clear(struct radix_tree_root *root,
>> unsigned long index, unsigned int tag)
>> {
>> - /*
>> - * The radix tree path needs to be one longer than the maximum
>> path
>> - * since the "list" is null terminated.
>> - */
>> - struct radix_tree_path path[RADIX_TREE_MAX_PATH + 1], *pathp =
>> path;
>> + struct radix_tree_node *node = NULL;
>> struct radix_tree_node *slot = NULL;
>> unsigned int height, shift;
>> + int uninitialized_var(offset);
>>
>> height = root->height;
>> if (index> radix_tree_maxindex(height))
>> goto out;
>>
>> - shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>> - pathp->node = NULL;
>> + shift = height * RADIX_TREE_MAP_SHIFT;
>> slot = indirect_to_ptr(root->rnode);
>>
>> - while (height> 0) {
>> - int offset;
>> -
>> + while (shift) {
>> if (slot == NULL)
>> goto out;
>>
>> + shift -= RADIX_TREE_MAP_SHIFT;
>> offset = (index>> shift)& RADIX_TREE_MAP_MASK;
>>
>> - pathp[1].offset = offset;
>> - pathp[1].node = slot;
>> + node = slot;
>> slot = slot->slots[offset];
>> - pathp++;
>> - shift -= RADIX_TREE_MAP_SHIFT;
>> - height--;
>> }
>>
>> if (slot == NULL)
>> goto out;
>>
>> - while (pathp->node) {
>> - if (!tag_get(pathp->node, tag, pathp->offset))
>> + while (node) {
>> + if (!tag_get(node, tag, offset))
>> goto out;
>> - tag_clear(pathp->node, tag, pathp->offset);
>> - if (any_tag_set(pathp->node, tag))
>> + tag_clear(node, tag, offset);
>> + if (any_tag_set(node, tag))
>> goto out;
>> - pathp--;
>> +
>> + index>>= RADIX_TREE_MAP_SHIFT;
>> + offset = index& RADIX_TREE_MAP_MASK;
>>
>> + node = node->parent;
>> }
>>
>> /* clear the root's tag bit */
>> @@ -646,8 +645,7 @@ unsigned long radix_tree_range_tag_if_ta
>> unsigned int iftag, unsigned int settag)
>> {
>> unsigned int height = root->height;
>> - struct radix_tree_path path[height];
>> - struct radix_tree_path *pathp = path;
>> + struct radix_tree_node *node = NULL;
>> struct radix_tree_node *slot;
>> unsigned int shift;
>> unsigned long tagged = 0;
>> @@ -671,14 +669,8 @@ unsigned long radix_tree_range_tag_if_ta
>> shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>> slot = indirect_to_ptr(root->rnode);
>>
>> - /*
>> - * we fill the path from (root->height - 2) to 0, leaving the
>> index at
>> - * (root->height - 1) as a terminator. Zero the node in the
>> terminator
>> - * so that we can use this to end walk loops back up the path.
>> - */
>> - path[height - 1].node = NULL;
>> -
>> for (;;) {
>> + unsigned long upindex;
>> int offset;
>>
>> offset = (index>> shift)& RADIX_TREE_MAP_MASK;
>>
>> @@ -686,12 +678,10 @@ unsigned long radix_tree_range_tag_if_ta
>> goto next;
>> if (!tag_get(slot, iftag, offset))
>> goto next;
>> - if (height> 1) {
>> + if (shift) {
>> /* Go down one level */
>> - height--;
>> shift -= RADIX_TREE_MAP_SHIFT;
>> - path[height - 1].node = slot;
>> - path[height - 1].offset = offset;
>> + node = slot;
>> slot = slot->slots[offset];
>> continue;
>> }
>> @@ -701,15 +691,21 @@ unsigned long radix_tree_range_tag_if_ta
>> tag_set(slot, settag, offset);
>>
>> /* walk back up the path tagging interior nodes */
>> - pathp =&path[0];
>>
>> - while (pathp->node) {
>> + upindex = index;
>> + while (node) {
>> + upindex>>= RADIX_TREE_MAP_SHIFT;
>> + offset = upindex& RADIX_TREE_MAP_MASK;
>>
>> +
>> /* stop if we find a node with the tag already set
>> */
>> - if (tag_get(pathp->node, settag, pathp->offset))
>> + if (tag_get(node, settag, offset))
>> break;
>> - tag_set(pathp->node, settag, pathp->offset);
>> - pathp++;
>> + tag_set(node, settag, offset);
>> + node = node->parent;
>> }
>>
>> + /* optimization: no need to walk up from this node again
>> */
>> + node = NULL;
>> +
>> next:
>> /* Go to next item at level determined by 'shift' */
>> index = ((index>> shift) + 1)<< shift;
>> @@ -724,8 +720,7 @@ next:
>> * last_index is guaranteed to be in the tree, what
>> * we do below cannot wander astray.
>> */
>> - slot = path[height - 1].node;
>> - height++;
>> + slot = slot->parent;
>> shift += RADIX_TREE_MAP_SHIFT;
>> }
>> }
>> @@ -1299,7 +1294,7 @@ static inline void radix_tree_shrink(str
>> /* try to shrink tree height */
>> while (root->height> 0) {
>> struct radix_tree_node *to_free = root->rnode;
>> - void *newptr;
>> + struct radix_tree_node *slot;
>>
>> BUG_ON(!radix_tree_is_indirect_ptr(to_free));
>> to_free = indirect_to_ptr(to_free);
>> @@ -1320,10 +1315,12 @@ static inline void radix_tree_shrink(str
>> * (to_free->slots[0]), it will be safe to dereference the
>> new
>> * one (root->rnode) as far as dependent read barriers go.
>> */
>> - newptr = to_free->slots[0];
>> - if (root->height> 1)
>> - newptr = ptr_to_indirect(newptr);
>> - root->rnode = newptr;
>> + slot = to_free->slots[0];
>> + if (root->height> 1) {
>> + slot->parent = NULL;
>> + slot = ptr_to_indirect(slot);
>> + }
>> + root->rnode = slot;
>> root->height--;
>>
>> /*
>> @@ -1363,16 +1360,12 @@ static inline void radix_tree_shrink(str
>> */
>> void *radix_tree_delete(struct radix_tree_root *root, unsigned long
>> index)
>> {
>> - /*
>> - * The radix tree path needs to be one longer than the maximum
>> path
>> - * since the "list" is null terminated.
>> - */
>> - struct radix_tree_path path[RADIX_TREE_MAX_PATH + 1], *pathp =
>> path;
>> + struct radix_tree_node *node = NULL;
>> struct radix_tree_node *slot = NULL;
>> struct radix_tree_node *to_free;
>> unsigned int height, shift;
>> int tag;
>> - int offset;
>> + int uninitialized_var(offset);
>>
>> height = root->height;
>> if (index> radix_tree_maxindex(height))
>> @@ -1385,39 +1378,35 @@ void *radix_tree_delete(struct radix_tre
>> goto out;
>> }
>> slot = indirect_to_ptr(slot);
>> -
>> - shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>> - pathp->node = NULL;
>> + shift = height * RADIX_TREE_MAP_SHIFT;
>>
>> do {
>> if (slot == NULL)
>> goto out;
>>
>> - pathp++;
>> + shift -= RADIX_TREE_MAP_SHIFT;
>> offset = (index>> shift)& RADIX_TREE_MAP_MASK;
>>
>> - pathp->offset = offset;
>> - pathp->node = slot;
>> + node = slot;
>> slot = slot->slots[offset];
>> - shift -= RADIX_TREE_MAP_SHIFT;
>> - height--;
>> - } while (height> 0);
>> + } while (shift);
>>
>> if (slot == NULL)
>> goto out;
>>
>> /*
>> - * Clear all tags associated with the just-deleted item
>> + * Clear all tags associated with the item to be deleted.
>> + * This way of doing it would be inefficient, but seldom is any
>> set.
>> */
>> for (tag = 0; tag< RADIX_TREE_MAX_TAGS; tag++) {
>> - if (tag_get(pathp->node, tag, pathp->offset))
>> + if (tag_get(node, tag, offset))
>> radix_tree_tag_clear(root, index, tag);
>> }
>>
>> to_free = NULL;
>> /* Now free the nodes we do not need anymore */
>> - while (pathp->node) {
>> - pathp->node->slots[pathp->offset] = NULL;
>> - pathp->node->count--;
>> + while (node) {
>> + node->slots[offset] = NULL;
>> + node->count--;
>> /*
>> * Queue the node for deferred freeing after the
>> * last reference to it disappears (set NULL, above).
>> @@ -1425,17 +1414,20 @@ void *radix_tree_delete(struct radix_tre
>> if (to_free)
>> radix_tree_node_free(to_free);
>>
>> - if (pathp->node->count) {
>> - if (pathp->node == indirect_to_ptr(root->rnode))
>> + if (node->count) {
>> + if (node == indirect_to_ptr(root->rnode))
>> radix_tree_shrink(root);
>> goto out;
>> }
>>
>> /* Node with zero slots in use so free it */
>> - to_free = pathp->node;
>> - pathp--;
>> + to_free = node;
>>
>> + index>>= RADIX_TREE_MAP_SHIFT;
>> + offset = index& RADIX_TREE_MAP_MASK;
>>
>> + node = node->parent;
>> }
>> +
>> root_tag_clear_all(root);
>> root->height = 0;
>> root->rnode = NULL;
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@...ck.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Fight unfair telecom internet charges in Canada: sign
>> http://stopthemeter.ca/
>> Don't email:<a href=mailto:"dont@...ck.org"> email@...ck.org</a>
Powered by blists - more mailing lists