lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Dec 2011 16:07:40 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
	Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH] radix_tree: take radix_tree_path off stack

On Sun, Dec 18, 2011 at 10:41:39PM -0800, Hugh Dickins wrote:
> Down, down in the deepest depths of GFP_NOIO page reclaim, we have
> shrink_page_list() calling __remove_mapping() calling __delete_from_
> swap_cache() or __delete_from_page_cache().
> 
> You would not expect those to need much stack, but in fact they call
> radix_tree_delete(): which declares a 192-byte radix_tree_path array
> on its stack (to record the node,offsets it visits when descending,
> in case it needs to ascend to update them).  And if any tag is still
> set [1], that calls radix_tree_tag_clear(), which declares a further
> such 192-byte radix_tree_path array on the stack.  (At least we have
> interrupts disabled here, so won't then be pushing registers too.)
> 
> That was probably a good choice when most users were 32-bit (array
> of half the size), and adding fields to radix_tree_node would have
> bloated it unnecessarily.  But nowadays many are 64-bit, and each
> radix_tree_node contains a struct rcu_head, which is only used when
> freeing; whereas the radix_tree_path info is only used for updating
> the tree (deleting, clearing tags or setting tags if tagged) when a
> lock must be held, of no interest when accessing the tree locklessly.
> 
> So add a parent pointer to the radix_tree_node, in union with the
> rcu_head, and remove all uses of the radix_tree_path.  There would
> be space in that union to save the offset when descending as before
> (we can argue that a lock must already be held to exclude other users),
> but recalculating it when ascending is both easy (a constant shift and
> a constant mask) and uncommon, so it seems better just to do that.

Seems sane - any modification of the tree contents or tag has to be
done under a lock, so we can maintain parent pointers safely. And we
don't reda them under RCU conditions, so we don't need any special
handling there, either.

> Two little optimizations: no need to decrement height when descending,
> adjusting shift is enough;

*nod*

> and once radix_tree_tag_if_tagged() has set
> tag on a node and its ancestors, it need not ascend from that node again.

I'm not sure I really follow this. I think I know what you mean, but
I can't quite get it straight and the comment in the code doesn't
help me get it straight. Can you describe it a bit more - I think
I'm just being dense at the moment....

> perf on the radix tree test harness reports radix_tree_insert() as 2%
> slower (now having to set parent), but radix_tree_delete() 24% faster.
> Surely that's an exaggeration from rtth's artificially low map shift 3,
> but forcing it back to 6 still rates radix_tree_delete() 8% faster.

Nice.

> [1] Can a pagecache tag (dirty, writeback or towrite) actually still be
> set at the time of radix_tree_delete()?  Perhaps not if the filesystem
> is well-behaved.  But although I've not tracked any stack overflow down
> to this cause, I have observed a curious case in which a dirty tag is
> set and left set on tmpfs: page migration's migrate_page_copy() happens
> to use __set_page_dirty_nobuffers() to set PageDirty on the newpage,
> and that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
> filesystem which doesn't use tags, except for this stack depth issue.

Not sure about the page cache, but other users of the radix tree
definitely do delete objects with tags still set. For example, when
XFS is reclaiming inodes it will delete the inode from it's internal
radix trees with the reclaim tag still set on the index. This
happens for every single inode that is reclaimed, so it's anything
but seldom and should really be considered a common operation....

Couple more comments below.

> @@ -274,18 +273,23 @@ static int radix_tree_extend(struct radi
>  		if (!(node = radix_tree_node_alloc(root)))
>  			return -ENOMEM;
>  
> -		/* Increase the height.  */
> -		node->slots[0] = indirect_to_ptr(root->rnode);
> -
>  		/* Propagate the aggregated tag info into the new root */
>  		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
>  			if (root_tag_get(root, tag))
>  				tag_set(node, tag, 0);
>  		}
>  
> +		/* Increase the height.  */
>  		newheight = root->height+1;

While touching this code, fixing the adjacent whitespace damage
would be good.

>  		node->height = newheight;
>  		node->count = 1;
> +		node->parent = NULL;
> +		slot = root->rnode;
> +		if (newheight > 1) {
> +			slot = indirect_to_ptr(slot);
> +			slot->parent = node;
> +		}
> +		node->slots[0] = slot;

This would be much more obvious in function if it separated the two
different cases completely:

		if (newheight > 1) {
			slot = indirect_to_ptr(root->rnode);
			slot->parent = node;
		} else {
			slot = root->rnode;
			node->parent = NULL;
		}
		node->slots[0] = slot;

> @@ -701,15 +691,21 @@ unsigned long radix_tree_range_tag_if_ta
>  		tag_set(slot, settag, offset);
>  
>  		/* walk back up the path tagging interior nodes */
> -		pathp = &path[0];
> -		while (pathp->node) {
> +		upindex = index;
> +		while (node) {
> +			upindex >>= RADIX_TREE_MAP_SHIFT;
> +			offset = upindex & RADIX_TREE_MAP_MASK;
> +
>  			/* stop if we find a node with the tag already set */
> -			if (tag_get(pathp->node, settag, pathp->offset))
> +			if (tag_get(node, settag, offset))
>  				break;
> -			tag_set(pathp->node, settag, pathp->offset);
> -			pathp++;
> +			tag_set(node, settag, offset);
> +			node = node->parent;
>  		}
>  
> +		/* optimization: no need to walk up from this node again */
> +		node = NULL;

As per my query above: why? That's the question the comment needs to
answer....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ