[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qqmgxeu.fsf@oracle.com>
Date: Tue, 01 Nov 2022 14:47:05 -0700
From: Stephen Brennan <stephen.s.brennan@...cle.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Jan Kara <jack@...e.cz>, Amir Goldstein <amir73il@...il.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] fsnotify: allow sleepable child dentry flag update
Al Viro <viro@...iv.linux.org.uk> writes:
> On Thu, Oct 13, 2022 at 03:27:19PM -0700, Stephen Brennan wrote:
>
>> So I have two more narrowly scoped strategies to improve the situation. Both are
>> included in the hacky, awful patch below.
>>
>> First, is to let __fsnotify_update_child_dentry_flags() sleep. This means nobody
>> is holding the spinlock for several seconds at a time. We can actually achieve
>> this via a cursor, the same way that simple_readdir() is implemented. I think
>> this might require moving the declaration of d_alloc_cursor() and maybe
>> exporting it. I had to #include fs/internal.h which is not ok.
>
> Er... Won't that expose every filesystem to having to deal with cursors?
> Currently it's entirely up to the filesystem in question and I wouldn't
> be a dime on everyone being ready to cope with such surprises...
Hi Al,
I wanted to follow-up on this. Yeah, I didn't realize that this could
cause issues for some filesystems when I wrote it, since I hadn't
considered that rather few filesystems use dcache_readdir(), and so most
aren't prepared to encounter a cursor. Thanks for that catch.
I think I came up with a better solution, which you can see in context
in v3 [1]. The only change I have from the posting there is to eliminate
the unnecssary "child->d_parent != parent" check. So the new idea is to
take a reference to the current child and then go to sleep. That alone
should be enough to prevent dentry_kill() from removing the child from
its parent's list. However, it wouldn't prevent d_move() from moving it
to a new list, nor would it prevent it from moving along the list if it
were a cursor. However, in this situation, we also hold the parent
i_rwsem in write mode, which means that no concurrent rename or readdir
can happen. You can see my full analysis here [2]. So here's the new
approach, if you can call out anything I've missod or confirm that it's
sound, I'd really appreciate it!
/* Must be called with inode->i_rwsem in held in write mode */
static bool __fsnotify_update_children_dentry_flags(struct inode *inode)
{
struct dentry *child, *alias, *last_ref = NULL;
alias = d_find_any_alias(inode);
/*
* These lists can get very long, so we may need to sleep during
* iteration. Normally this would be impossible without a cursor,
* but since we have the inode locked exclusive, we're guaranteed
* that the directory won't be modified, so whichever dentry we
* pick to sleep on won't get moved.
*/
spin_lock(&alias->d_lock);
list_for_each_entry(child, &alias->d_subdirs, d_child) {
if (need_resched()) {
/*
* We need to hold a reference while we sleep. But when
* we wake, dput() could free the dentry, invalidating
* the list pointers. We can't look at the list pointers
* until we re-lock the parent, and we can't dput() once
* we have the parent locked. So the solution is to hold
* onto our reference and free it the *next* time we drop
* alias->d_lock: either at the end of the function, or
* at the time of the next sleep.
*/
dget(child);
spin_unlock(&alias->d_lock);
dput(last_ref);
last_ref = child;
cond_resched();
spin_lock(&alias->d_lock);
}
if (!child->d_inode)
continue;
spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
if (watched)
child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
else
child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
spin_unlock(&child->d_lock);
}
spin_unlock(&alias->d_lock);
dput(last_ref);
dput(alias);
return watched;
}
Thanks,
Stephen
[1]: https://lore.kernel.org/linux-fsdevel/20221028001016.332663-4-stephen.s.brennan@oracle.com/
[2]: https://lore.kernel.org/linux-fsdevel/874jvigye9.fsf@oracle.com/
Powered by blists - more mailing lists