[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <173889462175.22054.10174686267926452398@noble.neil.brown.name>
Date: Fri, 07 Feb 2025 13:17:01 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Christian Brauner" <brauner@...nel.org>
Cc: "Alexander Viro" <viro@...iv.linux.org.uk>, "Jan Kara" <jack@...e.cz>,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"Jeff Layton" <jlayton@...nel.org>, "Dave Chinner" <david@...morbit.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 14/19] VFS: Ensure no async updates happening in directory
being removed.
On Fri, 07 Feb 2025, Christian Brauner wrote:
> On Thu, Feb 06, 2025 at 04:42:51PM +1100, NeilBrown wrote:
> > vfs_rmdir takes an exclusive lock on the target directory to ensure
> > nothing new is created in it while the rmdir progresses. With the
>
> It also excludes concurrent mount operations.
And it excludes chown and ACL changes. I doubt those are important. I
do need to check mount/unmount.
>
> > possibility of async updates continuing after the inode lock is dropped
> > we now need extra protection.
> >
> > Any async updates will have DCACHE_PAR_UPDATE set on the dentry. We
> > simply wait for that flag to be cleared on all children.
> >
> > Signed-off-by: NeilBrown <neilb@...e.de>
> > ---
> > fs/dcache.c | 2 +-
> > fs/namei.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index fb331596f1b1..90dee859d138 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -53,7 +53,7 @@
> > * - d_lru
> > * - d_count
> > * - d_unhashed()
> > - * - d_parent and d_chilren
> > + * - d_parent and d_children
> > * - childrens' d_sib and d_parent
> > * - d_u.d_alias, d_inode
> > *
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 3a107d6098be..e8a85c9f431c 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1839,6 +1839,27 @@ bool d_update_lock(struct dentry *dentry,
> > return true;
> > }
> >
> > +static void d_update_wait(struct dentry *dentry, unsigned int subclass)
> > +{
> > + /* Note this may only ever be called in a context where we have
> > + * a lock preventing this dentry from becoming locked, possibly
> > + * an update lock on the parent dentry. The must be a smp_mb()
> > + * after that lock is taken and before this is called so that
> > + * the following test is safe. d_update_lock() provides that
> > + * barrier.
> > + */
> > + if (!(dentry->d_flags & DCACHE_PAR_UPDATE))
> > + return
> > + lock_acquire_exclusive(&dentry->d_update_map, subclass,
> > + 0, NULL, _THIS_IP_);
> > + spin_lock(&dentry->d_lock);
> > + wait_var_event_spinlock(&dentry->d_flags,
> > + !check_dentry_locked(dentry),
> > + &dentry->d_lock);
> > + spin_unlock(&dentry->d_lock);
> > + lock_map_release(&dentry->d_update_map);
> > +}
> > +
> > bool d_update_trylock(struct dentry *dentry,
> > struct dentry *base,
> > const struct qstr *last)
> > @@ -4688,6 +4709,7 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> > struct dentry *dentry)
> > {
> > int error = may_delete(idmap, dir, dentry, 1);
> > + struct dentry *child;
> >
> > if (error)
> > return error;
> > @@ -4697,6 +4719,24 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> >
> > dget(dentry);
> > inode_lock(dentry->d_inode);
> > + /*
> > + * Some children of dentry might be active in an async update.
> > + * We need to wait for them. New children cannot be locked
> > + * while the inode lock is held.
> > + */
> > +again:
> > + spin_lock(&dentry->d_lock);
> > + for (child = d_first_child(dentry); child;
> > + child = d_next_sibling(child)) {
> > + if (child->d_flags & DCACHE_PAR_UPDATE) {
> > + dget(child);
> > + spin_unlock(&dentry->d_lock);
> > + d_update_wait(child, I_MUTEX_CHILD);
> > + dput(child);
> > + goto again;
> > + }
> > + }
> > + spin_unlock(&dentry->d_lock);
>
> That looks like it can cause stalls when you call rmdir on a directory
> that has a lots of children and a larg-ish subset of them has pending
> async updates, no?
>
It can certainly block waiting for other operations to complete, but
that is already the case when waiting for an exclusive lock on i_rwsem.
Any thread that has already tried to get that lock might get it before
rmdir eventually succeeds. So I don't think that is a behavioural
change.
I'm not concerned about walking the sibling list under a spinlock if the
list if very long. Maybe I could periodically take a ref to the current
child, drop and reclaim the spinlock, and hopefully continue from there.
Doing that on a non-D_PAR_UPDATE dentry should be safe. I wonder if
that complexity is worth it.
Thanks,
NeilBrown
Powered by blists - more mailing lists