lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EEEF3B3.909@gmail.com>
Date:	Mon, 19 Dec 2011 16:20:03 +0800
From:	"nai.xia" <nai.xia@...il.com>
To:	Hugh Dickins <hughd@...gle.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
	Dave Chinner <david@...morbit.com>,
	Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH] radix_tree: take radix_tree_path off stack



On 2011年12月19日 14:41, Hugh Dickins wrote:
> Down, down in the deepest depths of GFP_NOIO page reclaim, we have
> shrink_page_list() calling __remove_mapping() calling __delete_from_
> swap_cache() or __delete_from_page_cache().
>
> You would not expect those to need much stack, but in fact they call
> radix_tree_delete(): which declares a 192-byte radix_tree_path array
> on its stack (to record the node,offsets it visits when descending,
> in case it needs to ascend to update them).  And if any tag is still
> set [1], that calls radix_tree_tag_clear(), which declares a further
> such 192-byte radix_tree_path array on the stack.  (At least we have
> interrupts disabled here, so won't then be pushing registers too.)
>
> That was probably a good choice when most users were 32-bit (array
> of half the size), and adding fields to radix_tree_node would have
> bloated it unnecessarily.  But nowadays many are 64-bit, and each
> radix_tree_node contains a struct rcu_head, which is only used when

Can rcu_head in someway unionized with radix_tree_node->height
and radix_tree_node->count? count is always referenced under lock
and only the first node's height is referenced during lookup.
Seems like if we atomically set root->rnode to NULL, before
freeing the last node, we can ensure a valid read of the
radix_tree_node->height when lookup by following it with
a root->rnode == NULL test.

I am not very sure of course, just a naive feeling.


> freeing; whereas the radix_tree_path info is only used for updating
> the tree (deleting, clearing tags or setting tags if tagged) when a
> lock must be held, of no interest when accessing the tree locklessly.
>
> So add a parent pointer to the radix_tree_node, in union with the
> rcu_head, and remove all uses of the radix_tree_path.  There would
> be space in that union to save the offset when descending as before
> (we can argue that a lock must already be held to exclude other users),
> but recalculating it when ascending is both easy (a constant shift and
> a constant mask) and uncommon, so it seems better just to do that.
>
> Two little optimizations: no need to decrement height when descending,
> adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
> tag on a node and its ancestors, it need not ascend from that node again.
>
> perf on the radix tree test harness reports radix_tree_insert() as 2%
> slower (now having to set parent), but radix_tree_delete() 24% faster.
> Surely that's an exaggeration from rtth's artificially low map shift 3,
> but forcing it back to 6 still rates radix_tree_delete() 8% faster.
>
> [1] Can a pagecache tag (dirty, writeback or towrite) actually still be
> set at the time of radix_tree_delete()?  Perhaps not if the filesystem
> is well-behaved.  But although I've not tracked any stack overflow down
> to this cause, I have observed a curious case in which a dirty tag is
> set and left set on tmpfs: page migration's migrate_page_copy() happens
> to use __set_page_dirty_nobuffers() to set PageDirty on the newpage,
> and that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
> filesystem which doesn't use tags, except for this stack depth issue.
>
> Signed-off-by: Hugh Dickins<hughd@...gle.com>
> ----
>
>   lib/radix-tree.c |  148 +++++++++++++++++++++------------------------
>   1 file changed, 70 insertions(+), 78 deletions(-)
>
> --- 3.2-rc6/lib/radix-tree.c	2011-11-07 19:24:53.342418579 -0800
> +++ linux/lib/radix-tree.c	2011-12-16 20:40:26.152758485 -0800
> @@ -48,16 +48,14 @@
>   struct radix_tree_node {
>   	unsigned int	height;		/* Height from the bottom */
>   	unsigned int	count;
> -	struct rcu_head	rcu_head;
> +	union {
> +		struct radix_tree_node *parent;	/* Used when ascending tree */
> +		struct rcu_head	rcu_head;	/* Used when freeing node */
> +	};
>   	void __rcu	*slots[RADIX_TREE_MAP_SIZE];
>   	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
>   };
>
> -struct radix_tree_path {
> -	struct radix_tree_node *node;
> -	int offset;
> -};
> -
>   #define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
>   #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
>   					  RADIX_TREE_MAP_SHIFT))
> @@ -256,6 +254,7 @@ static inline unsigned long radix_tree_m
>   static int radix_tree_extend(struct radix_tree_root *root, unsigned long index)
>   {
>   	struct radix_tree_node *node;
> +	struct radix_tree_node *slot;
>   	unsigned int height;
>   	int tag;
>
> @@ -274,18 +273,23 @@ static int radix_tree_extend(struct radi
>   		if (!(node = radix_tree_node_alloc(root)))
>   			return -ENOMEM;
>
> -		/* Increase the height.  */
> -		node->slots[0] = indirect_to_ptr(root->rnode);
> -
>   		/* Propagate the aggregated tag info into the new root */
>   		for (tag = 0; tag<  RADIX_TREE_MAX_TAGS; tag++) {
>   			if (root_tag_get(root, tag))
>   				tag_set(node, tag, 0);
>   		}
>
> +		/* Increase the height.  */
>   		newheight = root->height+1;
>   		node->height = newheight;
>   		node->count = 1;
> +		node->parent = NULL;
> +		slot = root->rnode;
> +		if (newheight>  1) {
> +			slot = indirect_to_ptr(slot);
> +			slot->parent = node;
> +		}
> +		node->slots[0] = slot;
>   		node = ptr_to_indirect(node);
>   		rcu_assign_pointer(root->rnode, node);
>   		root->height = newheight;
> @@ -331,6 +335,7 @@ int radix_tree_insert(struct radix_tree_
>   			if (!(slot = radix_tree_node_alloc(root)))
>   				return -ENOMEM;
>   			slot->height = height;
> +			slot->parent = node;
>   			if (node) {
>   				rcu_assign_pointer(node->slots[offset], slot);
>   				node->count++;
> @@ -504,47 +509,41 @@ EXPORT_SYMBOL(radix_tree_tag_set);
>   void *radix_tree_tag_clear(struct radix_tree_root *root,
>   			unsigned long index, unsigned int tag)
>   {
> -	/*
> -	 * The radix tree path needs to be one longer than the maximum path
> -	 * since the "list" is null terminated.
> -	 */
> -	struct radix_tree_path path[RADIX_TREE_MAX_PATH + 1], *pathp = path;
> +	struct radix_tree_node *node = NULL;
>   	struct radix_tree_node *slot = NULL;
>   	unsigned int height, shift;
> +	int uninitialized_var(offset);
>
>   	height = root->height;
>   	if (index>  radix_tree_maxindex(height))
>   		goto out;
>
> -	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
> -	pathp->node = NULL;
> +	shift = height * RADIX_TREE_MAP_SHIFT;
>   	slot = indirect_to_ptr(root->rnode);
>
> -	while (height>  0) {
> -		int offset;
> -
> +	while (shift) {
>   		if (slot == NULL)
>   			goto out;
>
> +		shift -= RADIX_TREE_MAP_SHIFT;
>   		offset = (index>>  shift)&  RADIX_TREE_MAP_MASK;
> -		pathp[1].offset = offset;
> -		pathp[1].node = slot;
> +		node = slot;
>   		slot = slot->slots[offset];
> -		pathp++;
> -		shift -= RADIX_TREE_MAP_SHIFT;
> -		height--;
>   	}
>
>   	if (slot == NULL)
>   		goto out;
>
> -	while (pathp->node) {
> -		if (!tag_get(pathp->node, tag, pathp->offset))
> +	while (node) {
> +		if (!tag_get(node, tag, offset))
>   			goto out;
> -		tag_clear(pathp->node, tag, pathp->offset);
> -		if (any_tag_set(pathp->node, tag))
> +		tag_clear(node, tag, offset);
> +		if (any_tag_set(node, tag))
>   			goto out;
> -		pathp--;
> +
> +		index>>= RADIX_TREE_MAP_SHIFT;
> +		offset = index&  RADIX_TREE_MAP_MASK;
> +		node = node->parent;
>   	}
>
>   	/* clear the root's tag bit */
> @@ -646,8 +645,7 @@ unsigned long radix_tree_range_tag_if_ta
>   		unsigned int iftag, unsigned int settag)
>   {
>   	unsigned int height = root->height;
> -	struct radix_tree_path path[height];
> -	struct radix_tree_path *pathp = path;
> +	struct radix_tree_node *node = NULL;
>   	struct radix_tree_node *slot;
>   	unsigned int shift;
>   	unsigned long tagged = 0;
> @@ -671,14 +669,8 @@ unsigned long radix_tree_range_tag_if_ta
>   	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>   	slot = indirect_to_ptr(root->rnode);
>
> -	/*
> -	 * we fill the path from (root->height - 2) to 0, leaving the index at
> -	 * (root->height - 1) as a terminator. Zero the node in the terminator
> -	 * so that we can use this to end walk loops back up the path.
> -	 */
> -	path[height - 1].node = NULL;
> -
>   	for (;;) {
> +		unsigned long upindex;
>   		int offset;
>
>   		offset = (index>>  shift)&  RADIX_TREE_MAP_MASK;
> @@ -686,12 +678,10 @@ unsigned long radix_tree_range_tag_if_ta
>   			goto next;
>   		if (!tag_get(slot, iftag, offset))
>   			goto next;
> -		if (height>  1) {
> +		if (shift) {
>   			/* Go down one level */
> -			height--;
>   			shift -= RADIX_TREE_MAP_SHIFT;
> -			path[height - 1].node = slot;
> -			path[height - 1].offset = offset;
> +			node = slot;
>   			slot = slot->slots[offset];
>   			continue;
>   		}
> @@ -701,15 +691,21 @@ unsigned long radix_tree_range_tag_if_ta
>   		tag_set(slot, settag, offset);
>
>   		/* walk back up the path tagging interior nodes */
> -		pathp =&path[0];
> -		while (pathp->node) {
> +		upindex = index;
> +		while (node) {
> +			upindex>>= RADIX_TREE_MAP_SHIFT;
> +			offset = upindex&  RADIX_TREE_MAP_MASK;
> +
>   			/* stop if we find a node with the tag already set */
> -			if (tag_get(pathp->node, settag, pathp->offset))
> +			if (tag_get(node, settag, offset))
>   				break;
> -			tag_set(pathp->node, settag, pathp->offset);
> -			pathp++;
> +			tag_set(node, settag, offset);
> +			node = node->parent;
>   		}
>
> +		/* optimization: no need to walk up from this node again */
> +		node = NULL;
> +
>   next:
>   		/* Go to next item at level determined by 'shift' */
>   		index = ((index>>  shift) + 1)<<  shift;
> @@ -724,8 +720,7 @@ next:
>   			 * last_index is guaranteed to be in the tree, what
>   			 * we do below cannot wander astray.
>   			 */
> -			slot = path[height - 1].node;
> -			height++;
> +			slot = slot->parent;
>   			shift += RADIX_TREE_MAP_SHIFT;
>   		}
>   	}
> @@ -1299,7 +1294,7 @@ static inline void radix_tree_shrink(str
>   	/* try to shrink tree height */
>   	while (root->height>  0) {
>   		struct radix_tree_node *to_free = root->rnode;
> -		void *newptr;
> +		struct radix_tree_node *slot;
>
>   		BUG_ON(!radix_tree_is_indirect_ptr(to_free));
>   		to_free = indirect_to_ptr(to_free);
> @@ -1320,10 +1315,12 @@ static inline void radix_tree_shrink(str
>   		 * (to_free->slots[0]), it will be safe to dereference the new
>   		 * one (root->rnode) as far as dependent read barriers go.
>   		 */
> -		newptr = to_free->slots[0];
> -		if (root->height>  1)
> -			newptr = ptr_to_indirect(newptr);
> -		root->rnode = newptr;
> +		slot = to_free->slots[0];
> +		if (root->height>  1) {
> +			slot->parent = NULL;
> +			slot = ptr_to_indirect(slot);
> +		}
> +		root->rnode = slot;
>   		root->height--;
>
>   		/*
> @@ -1363,16 +1360,12 @@ static inline void radix_tree_shrink(str
>    */
>   void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
>   {
> -	/*
> -	 * The radix tree path needs to be one longer than the maximum path
> -	 * since the "list" is null terminated.
> -	 */
> -	struct radix_tree_path path[RADIX_TREE_MAX_PATH + 1], *pathp = path;
> +	struct radix_tree_node *node = NULL;
>   	struct radix_tree_node *slot = NULL;
>   	struct radix_tree_node *to_free;
>   	unsigned int height, shift;
>   	int tag;
> -	int offset;
> +	int uninitialized_var(offset);
>
>   	height = root->height;
>   	if (index>  radix_tree_maxindex(height))
> @@ -1385,39 +1378,35 @@ void *radix_tree_delete(struct radix_tre
>   		goto out;
>   	}
>   	slot = indirect_to_ptr(slot);
> -
> -	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
> -	pathp->node = NULL;
> +	shift = height * RADIX_TREE_MAP_SHIFT;
>
>   	do {
>   		if (slot == NULL)
>   			goto out;
>
> -		pathp++;
> +		shift -= RADIX_TREE_MAP_SHIFT;
>   		offset = (index>>  shift)&  RADIX_TREE_MAP_MASK;
> -		pathp->offset = offset;
> -		pathp->node = slot;
> +		node = slot;
>   		slot = slot->slots[offset];
> -		shift -= RADIX_TREE_MAP_SHIFT;
> -		height--;
> -	} while (height>  0);
> +	} while (shift);
>
>   	if (slot == NULL)
>   		goto out;
>
>   	/*
> -	 * Clear all tags associated with the just-deleted item
> +	 * Clear all tags associated with the item to be deleted.
> +	 * This way of doing it would be inefficient, but seldom is any set.
>   	 */
>   	for (tag = 0; tag<  RADIX_TREE_MAX_TAGS; tag++) {
> -		if (tag_get(pathp->node, tag, pathp->offset))
> +		if (tag_get(node, tag, offset))
>   			radix_tree_tag_clear(root, index, tag);
>   	}
>
>   	to_free = NULL;
>   	/* Now free the nodes we do not need anymore */
> -	while (pathp->node) {
> -		pathp->node->slots[pathp->offset] = NULL;
> -		pathp->node->count--;
> +	while (node) {
> +		node->slots[offset] = NULL;
> +		node->count--;
>   		/*
>   		 * Queue the node for deferred freeing after the
>   		 * last reference to it disappears (set NULL, above).
> @@ -1425,17 +1414,20 @@ void *radix_tree_delete(struct radix_tre
>   		if (to_free)
>   			radix_tree_node_free(to_free);
>
> -		if (pathp->node->count) {
> -			if (pathp->node == indirect_to_ptr(root->rnode))
> +		if (node->count) {
> +			if (node == indirect_to_ptr(root->rnode))
>   				radix_tree_shrink(root);
>   			goto out;
>   		}
>
>   		/* Node with zero slots in use so free it */
> -		to_free = pathp->node;
> -		pathp--;
> +		to_free = node;
>
> +		index>>= RADIX_TREE_MAP_SHIFT;
> +		offset = index&  RADIX_TREE_MAP_MASK;
> +		node = node->parent;
>   	}
> +
>   	root_tag_clear_all(root);
>   	root->height = 0;
>   	root->rnode = NULL;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@...ck.org">  email@...ck.org</a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ