lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Oct 2019 21:11:31 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Ritesh Harjani <riteshh@...ux.ibm.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        wugyuan@...ibm.com, jlayton@...nel.org, hsiangkao@....com,
        Jan Kara <jack@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast

On Tue, Oct 22, 2019 at 03:37:36PM +0100, Al Viro wrote:
> On Tue, Oct 22, 2019 at 07:08:54PM +0530, Ritesh Harjani wrote:
> > I think we have still not taken this patch. Al?

> You've picked the easiest one to hit, but on e.g. KVM setups you can have the
> host thread representing the CPU where __d_set_inode_and_type() runs get
> preempted (by the host kernel), leaving others with much wider window.
> 
> Sure, we can do that to all callers of d_is_negative/d_is_positive, but...
> the same goes for any places that assumes that d_is_dir() implies that
> the sucker is positive, etc.
> 
> What we have guaranteed is
> 	* ->d_lock serializes ->d_flags/->d_inode changes
> 	* ->d_seq is bumped before/after such changes
> 	* positive dentry never changes ->d_inode as long as you hold
> a reference (negative dentry *can* become positive right under you)
> 
> So there are 3 classes of valid users: those holding ->d_lock, those
> sampling and rechecking ->d_seq and those relying upon having observed
> the sucker they've pinned to be positive.
> 
> What you've been hitting is "we have it pinned, ->d_flags says it's
> positive but we still observe the value of ->d_inode from before the
> store to ->d_flags that has made it look positive".

Actually, your patch opens another problem there.  Suppose you make
it d_really_is_positive() and hit the same race sans reordering.
Dentry is found by __d_lookup() and is negative.  Right after we
return from __d_lookup() another thread makes it positive (a symlink)
- ->d_inode is set, d_really_is_positive() becomes true.  OK, on we
go, pick the inode and move on.  Right?  ->d_flags is still not set
by the thread that made it positive.  We return from lookup_fast()
and call step_into().  And get to
        if (likely(!d_is_symlink(path->dentry)) ||
Which checks ->d_flags and sees the value from before the sucker
became positive.  IOW, d_is_symlink() is false here.  If that
was the last path component and we'd been told to follow links,
we will end up with positive dentry of a symlink coming out of
pathname resolution.

Similar fun happens if you have mkdir racing with lookup - ENOENT
is what should've happened if lookup comes first, success - if
mkdir does.  This way we can hit ENOTDIR, due to false negative
from d_can_lookup().

IOW, d_really_is_negative() in lookup_fast() will paper over
one of oopsen, but it
	* won't cover similar oopsen on other codepaths and
	* will lead to bogus behaviour.

I'm not sure that blanket conversion of d_is_... to smp_load_acquire()
is the right solution; it might very well be that we need to do that
only on a small subset of call sites, lookup_fast() being one of
those.  But we do want at least to be certain that something we'd
got from lookup_fast() in non-RCU mode already has ->d_flags visible.

I'm going through the callers right now, will post a followup once
the things get cleaner...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ