[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190402175401.GL2217@ZenIV.linux.org.uk>
Date: Tue, 2 Apr 2019 18:54:01 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Jonathan Corbet <corbet@....net>
Cc: "Tobin C. Harding" <tobin@...nel.org>,
Mauro Carvalho Chehab <mchehab@...pensource.com>,
Neil Brown <neilb@...e.com>,
Randy Dunlap <rdunlap@...radead.org>,
linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Ian Kent <raven@...maw.net>
Subject: Re: [PATCH v3 00/24] Convert vfs.txt to vfs.rst
On Tue, Apr 02, 2019 at 05:48:24PM +0100, Al Viro wrote:
> ->d_prune() must not drop/regain any of the locks held by caller.
> It must _not_ free anything attached to dentry - that belongs
> later in the shutdown sequence. If anything, I'm tempted to
> make it take const struct dentry * as argument, just to make
> that clear.
>
> No new (counted) references can be acquired by that point;
> lockless dcache lookup might find our dentry a match, but
> result of such lookup is not going to be legitimized - it's
> doomed to be thrown out as stale.
>
> It really makes more sense as part of struct dentry lifecycle
> description...
>
> [1] in theory, ->d_time might be changed by overlapping lockless
> call of ->d_revalidate(). Up to filesystem - VFS doesn't touch
> that field (and AFAICS only NFS uses it these days).
One addition: ->d_prune() can overlap only with
* lockless ->d_hash()/->d_compare()
* lockless ->d_revalidate()
* lockless ->d_manage()
So it must not destroy anything used by those without an RCU
delay. The same goes for ->d_release() - both the list of
the things it can overlap with and requirements re RCU delays.
In-tree ->d_prune() instances are fine and so's the majority
of ->d_release(). However, autofs ->d_release() has something
that looks like an RCU use-after-free waiting to happen:
static void autofs_dentry_release(struct dentry *de)
{
struct autofs_info *ino = autofs_dentry_ino(de);
struct autofs_sb_info *sbi = autofs_sbi(de->d_sb);
pr_debug("releasing %p\n", de);
if (!ino)
return;
...
autofs_free_ino(ino);
}
with autofs_free_ino() being straight kfree(). Which means
that the lockless case of autofs_d_manage() can run into
autofs_dentry_ino(dentry) getting freed right under it.
And there we do have this reachable:
int autofs_expire_wait(const struct path *path, int rcu_walk)
{
struct dentry *dentry = path->dentry;
struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb);
struct autofs_info *ino = autofs_dentry_ino(dentry);
int status;
int state;
/* Block on any pending expire */
if (!(ino->flags & AUTOFS_INF_WANT_EXPIRE))
return 0;
if (rcu_walk)
return -ECHILD;
the second check buggers off in lockless mode; the first one
can be done in lockless mode just fine, so AFAICS we do have
a problem there. Smells like we ought to make that kfree
in autofs_free_ino() RCU-delayed... Ian, have you, by any
chance, run into reports like that? Use-after-free or
oopsen in autofs_expire_wait() and friends, that is...
AFAICS, everything else is safe; however, looking through
those has turned up a fishy spot in ceph_d_revalidate():
} else if (d_really_is_positive(dentry) &&
ceph_snap(d_inode(dentry)) == CEPH_SNAPDIR) {
valid = 1;
Again, lockless ->d_revalidate() is called without anything
that would hold ->d_inode stable; the first part of the
condition does not guarantee that we won't run into
ceph_snap(NULL) and oops. Sure, compiler is almost certainly
not going to reload here, but we ought to use d_inode_rcu(dentry)
there.
Sigh...
Powered by blists - more mailing lists