lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 18 Sep 2012 16:53:48 +0200
From:	Miklos Szeredi <miklos@...redi.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Al Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, hch@...radead.org,
	Trond.Myklebust@...app.com
Subject: Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Mon, Sep 17, 2012 at 1:39 PM, Al Viro <viro@...iv.linux.org.uk> wrote:
>>
>> Egads...  The problem is real and analysis, AFAICS, is correct, but result
>> is extremely ugly ;-/
>
> Agreed.
>
> The problem (or at least one *part* of the problem) is that the "goto
> rename_retry" case is done for two different entities entirely:
>
>  - the "try_to_ascend()" failure path, which can happen even when
> renamelock is held for writing.
>
>  - the "if we weren't write-locked before, and the read-lock failed"
> case (which obviously cannot happen if we already held things for
> writing)
>
> That said, I'm not sure why/how that try_to_ascend() could even fail
> when we're holding things locked. I guess it's the DCACHE_DISCONNECTED
> case that triggers.

Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED
case that triggers the double-lock.  Trond was misusing
DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much
more likely (and bogus).  But there is a case, which is triggered rarely
if ever, when try_to_ascend() failure with rename_lock held is perfectly
valid.

try_to_ascend() does an ellaborate locking dance that aims to ensure
that child remains valid after having ascended to parent.  It nees a
valid child because it needs the next sibling to continue tree
traversal.  If the child becomes invalid (final dput()) then the whole
tree traversal needs to be restarted since we don't know where we left
off. So we basically do

 1. take the RCU lock
 2. unlock child
 3. lock parent
 4. check whether child has been freed, if yes restart traversal
 5. release RCU
 6. find next sibling (child->d_child.next)
 7. lock new child

It is step 4. that needs the RCU guarantees so we can check child's
dentry flag (DCACHE_DISCONNECTED flag presently).  After that d_lock on
parent will protect "child" from going away (d_kill or d_move) and we
can get hold of the next sibling.

Thanks,
Miklos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists