[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871sl6eo7e.fsf@notabene.neil.brown.name>
Date: Fri, 10 Nov 2017 15:45:41 +1100
From: NeilBrown <neilb@...e.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Al Viro <viro@...iv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] VFS: close race between getcwd() and d_move()
On Thu, Nov 09 2017, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 2:14 PM, NeilBrown <neilb@...e.com> wrote:
>> On Thu, Nov 09 2017, Linus Torvalds wrote:
>>>
>>> How nasty would it be to just expand the calls to __d_drop/__d_rehash
>>> into __d_move itself, and take both has list locks at the same time
>>> (with the usual ordering and checking if it's the same list, of
>>> course).
>>
>> something like this?
>
> Yes.
>
> This looks nicer to me. Partly because I hate those "pass flags to
> functions that modify their behavior" kinds of patches. I'd rather see
> just straight-line unconditional code with some possible duplication.
...
>
> I also do wonder if we can avoid all the unhash/rehash games entirely
> (and avoid the hash list locking) if it turns out that the dentry and
> target hash lists are the same.
I'm not convinced. I haven't actually tried it, but the matrix of
possibilities seems a little large.
The source dentry may or may not be hashed (not in the "disconnected
IS_ROOT" case), and the target may or may not want to be rehashed
afterwards (depending on 'exchange'). We could skip the lock
for an exchange if they both had the same hash, but not for a simple
move.
However your description of what it was that you didn't like gave me an
idea - I can take the same approach as my original, but not pass flags
around.
I quite like how this turned out.
Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add
___d_rehash() without the BUG_ON() and call that from __d_rehash?
Thanks,
NeilBrown
From: NeilBrown <neilb@...e.com>
Date: Fri, 10 Nov 2017 15:20:06 +1100
Subject: [PATCH] VFS: close race between getcwd() and d_move()
d_move() will call __d_drop() and then __d_rehash()
on the dentry being moved. This creates a small window
when the dentry appears to be unhashed. Many tests
of d_unhashed() are made under ->d_lock and so are safe
from racing with this window, but some aren't.
In particular, getcwd() calls d_unlinked() (which calls
d_unhashed()) without d_lock protection, so it can race.
This races has been seen in practice with lustre, which uses d_move() as
part of name lookup. See:
https://jira.hpdd.intel.com/browse/LU-9735
It could race with a regular rename(), and result in ENOENT instead
of either the 'before' or 'after' name.
The race can be demonstrated with a simple program which
has two threads, one renaming a directory back and forth
while another calls getcwd() within that directory: it should never
fail, but does. See:
https://patchwork.kernel.org/patch/9455345/
We could fix this race by taking d_lock and rechecking when
d_unhashed() reports true. Alternately when can remove the window,
which is the approach this patch takes.
___d_drop() is introduce which does *not* clear d_hash.pprev
so the dentry still appears to be hashed. __d_drop() calls
___d_drop(), then clears d_hash.pprev.
__d_move() now uses ___d_drop() and only clears d_hash.pprev
when not rehashing.
Signed-off-by: NeilBrown <neilb@...e.com>
---
fs/dcache.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..8c83543f5065 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -468,9 +468,11 @@ static void dentry_lru_add(struct dentry *dentry)
* d_drop() is used mainly for stuff that wants to invalidate a dentry for some
* reason (NFS timeouts or autofs deletes).
*
- * __d_drop requires dentry->d_lock.
+ * __d_drop requires dentry->d_lock
+ * ___d_drop doesn't mark dentry as "unhashed"
+ * (dentry->d_hash.pprev will be LIST_POISON2, not NULL).
*/
-void __d_drop(struct dentry *dentry)
+static void ___d_drop(struct dentry *dentry)
{
if (!d_unhashed(dentry)) {
struct hlist_bl_head *b;
@@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry)
hlist_bl_lock(b);
__hlist_bl_del(&dentry->d_hash);
- dentry->d_hash.pprev = NULL;
hlist_bl_unlock(b);
/* After this call, in-progress rcu-walk path lookup will fail. */
write_seqcount_invalidate(&dentry->d_seq);
}
}
+void __d_drop(struct dentry *dentry) {
+ ___d_drop(dentry);
+ dentry->d_hash.pprev = NULL;
+}
EXPORT_SYMBOL(__d_drop);
void d_drop(struct dentry *dentry)
@@ -2381,7 +2386,7 @@ EXPORT_SYMBOL(d_delete);
static void __d_rehash(struct dentry *entry)
{
struct hlist_bl_head *b = d_hash(entry->d_name.hash);
- BUG_ON(!d_unhashed(entry));
+
hlist_bl_lock(b);
hlist_bl_add_head_rcu(&entry->d_hash, b);
hlist_bl_unlock(b);
@@ -2818,9 +2823,9 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);
/* unhash both */
- /* __d_drop does write_seqcount_barrier, but they're OK to nest. */
- __d_drop(dentry);
- __d_drop(target);
+ /* ___d_drop does write_seqcount_barrier, but they're OK to nest. */
+ ___d_drop(dentry);
+ ___d_drop(target);
/* Switch the names.. */
if (exchange)
@@ -2832,6 +2837,8 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
__d_rehash(dentry);
if (exchange)
__d_rehash(target);
+ else
+ target->d_hash.pprev = NULL;
/* ... and switch them in the tree */
if (IS_ROOT(dentry)) {
--
2.14.0.rc0.dirty
Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)
Powered by blists - more mailing lists