lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 13 Jun 2008 05:47:54 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	peterz@...radead.org, linux-kernel@...r.kernel.org,
	paulmck@...ibm.com
Subject: Re: [patch] radix-tree: fix small lockless radix-tree bug

On Friday 13 June 2008 05:34, Andrew Morton wrote:
> On Fri, 13 Jun 2008 05:03:45 +1000
>
> Nick Piggin <nickpiggin@...oo.com.au> wrote:
> > @@ -124,6 +175,17 @@ static void radix_tree_node_rcu_free(str
> >  {
> >  	struct radix_tree_node *node =
> >  			container_of(head, struct radix_tree_node, rcu_head);
> > +
> > +	/*
> > +	 * must only free zeroed nodes into the slab. radix_tree_shrink
> > +	 * can leave us with a non-NULL entry in the first slot, so clear
> > +	 * that here to make sure.
> > +	 */
> > +	tag_clear(node, 0, 0);
> > +	tag_clear(node, 1, 0);
> > +	node->slots[0] = NULL;
> > +	node->count = 0;
> > +
> >  	kmem_cache_free(radix_tree_node_cachep, node);
> >  }
>
> oic, that stuff got moved from the synchronous case into the RCU callback
> case.

Yeah, it was annoying because the real change moved the first references
of a couple of those up. Then we have to move all of them to keep them
in one place.

The changelog I submitted looks like utter gibberish, btw. Rewritten one
here which should be a bit better.


We shrink a radix tree when its root node has only one child, in the left
most slot. The child becomes the new root node. To perform this operation
in a manner compatible with concurrent lockless lookups, we atomically
switch the root pointer from the parent to its child.

However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next). For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing. Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.

This is all pretty standard RCU stuff. It is worth repeating because
in my eagerness to obey the radix tree node constructor scheme, I had
broken it by zeroing the radix tree node before the grace period.

What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the
slot contained NULL due to the concurrent shrinker having zeroed the
parent node before waiting for a grace period. The lookup would return
a false negative as a result.

Fix it by doing that clearing in the RCU callback. I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).


This was never actually found in any lockless pagecache testing or by
the test harness, but by seeing the odd problem with my scalable vmap
rewrite. I have not tickled the test harness into reproducing it yet,
but I'll keep working at it.

Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ