lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100820020407.GX10429@dastard>
Date:	Fri, 20 Aug 2010 12:04:07 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	npiggin@...nel.dk, a.p.zijlstra@...llo.nl
Subject: Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly

On Fri, Aug 20, 2010 at 08:25:59AM +1000, Dave Chinner wrote:
> On Thu, Aug 19, 2010 at 05:58:39PM +0200, Jan Kara wrote:
> >   Hi Dave,
> > 
> > On Thu 19-08-10 23:25:52, Dave Chinner wrote:
> > > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally
> > > broken.  All the tag set/clear code stores the tree path in a cursor
> > > and uses that to propagate the tags if and only if the full path
> > > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets
> > > tags on intermediate nodes before it has resolved the full path and
> > > hence can set tags when it should not. The "should not" cases occur
> > > when we have to tag sub-ranges or the scan aborts because it's
> > > reached the number ot tag in a batch.
> >   Thanks for debugging this! You are right that the code can leave dangling
> > tag when we end the scan at the end of given range but the first tagged
> > leaf is after the end of the given range (there shouldn't be a problem with
> > the batches because there we can exit only just after we tag a leaf so that
> > should be OK).
> >   There are two possibilities how to fix the bug:
> > a) Always tag bottom up - i.e., when we see leaf that should be tagged, go
> > up and tag the parent as well if it is not already tagged.
> > b) When we exit the search and we didn't not set any leaf tag since last
> > time we went down, we walk up the tree and do an equivalent of
> > radix_tree_clear_tag().
> >   I'll probably go for a) since it looks more robust but b) would be
> > probably faster.
> 
> I think that when it comes to data integrity, more robust should
> win over speed every time. I think it can be done quite easily,
> though, having slept on it - we have the current path in the
> open_slots[] array, so we could just walk that when we set a leaf
> tag. That should be easy to optimise as well - just keep track of
> how high up the path we have set the tag and only walk that far
> when setting the tags. That way we don't continually set the tag on
> the root higher level slots. That shouldn't be any slower than the
> current code...

Fixing this indicates that there is a second bug also corrupting the
PAGECACHE_TAG_TOWRITE tags - it takes quite a bit longer to hit, but
when it fails it is generally because the bit at slot offset zero in
a high-up intermediate node is incorrectly set. It appears that none
of the code is actually setting it, so it's been quite difficult to
track down.

Eventually I noticed through code inspection that
radix_tree_node_rcu_free() clears the tag at offset zero for the
because of the radix_tree_shrink implementation potentially leaving
the first slot non-null. The addition of the third tag did not add
this clearing of the tag in the zero slot.  Adding this:

 	 */
 	tag_clear(node, 0, 0);
 	tag_clear(node, 1, 0);
+	tag_clear(node, 2, 0);
 	node->slots[0] = NULL;
 	node->count = 0;
 
To radix_tree_node_rcu_free() appears to fix the problem. Whoever
failed to coment the definition of the number of tags the radix tree
supports left a really nasty landmine that Jan stepped on. Cleaning
up the mess hasn't been pretty, either.

So, after a couple of days of debugging I finally have test
013 passing without failing. Now to clean up the mess I have and
test some proper patches....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ