[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240701101536.jb452t25xds6x7f3@quack3>
Date: Mon, 1 Jul 2024 12:15:36 +0200
From: Jan Kara <jack@...e.cz>
To: Alexander Larsson <alexl@...hat.com>
Cc: Christian Brauner <brauner@...nel.org>, Ian Kent <ikent@...hat.com>,
Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>,
Lucas Karpinski <lkarpins@...hat.com>, viro@...iv.linux.org.uk,
raven@...maw.net, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Eric Chanudet <echanude@...hat.com>
Subject: Re: [RFC v3 1/1] fs/namespace: remove RCU sync for MNT_DETACH umount
On Mon 01-07-24 10:41:40, Alexander Larsson wrote:
> On Mon, Jul 1, 2024 at 7:50 AM Christian Brauner <brauner@...nel.org> wrote:
> >
> > > I always thought the rcu delay was to ensure concurrent path walks "see" the
> > >
> > > umount not to ensure correct operation of the following mntput()(s).
> > >
> > >
> > > Isn't the sequence of operations roughly, resolve path, lock, deatch,
> > > release
> > >
> > > lock, rcu wait, mntput() subordinate mounts, put path.
> >
> > The crucial bit is really that synchronize_rcu_expedited() ensures that
> > the final mntput() won't happen until path walk leaves RCU mode.
> >
> > This allows caller's like legitimize_mnt() which are called with only
> > the RCU read-lock during lazy path walk to simple check for
> > MNT_SYNC_UMOUNT and see that the mnt is about to be killed. If they see
> > that this mount is MNT_SYNC_UMOUNT then they know that the mount won't
> > be freed until an RCU grace period is up and so they know that they can
> > simply put the reference count they took _without having to actually
> > call mntput()_.
> >
> > Because if they did have to call mntput() they might end up shutting the
> > filesystem down instead of umount() and that will cause said EBUSY
> > errors I mentioned in my earlier mails.
>
> But such behaviour could be kept even without an expedited RCU sync.
> Such as in my alternative patch for this:
> https://www.spinics.net/lists/linux-fsdevel/msg270117.html
>
> I.e. we would still guarantee the final mput is called, but not block
> the return of the unmount call.
So FWIW the approach of handing off the remainder of namespace_unlock()
into rcu callback for lazy unmount looks workable to me. Just as Al Viro
pointed out you cannot do all the stuff right from the RCU callback as the
context doesn't allow all the work to happen there, so you just need to
queue work from RCU callback and then do the real work from there (but OTOH
you can avoid the task work in mnput_noexpire() in that case - will need a
bit of refactoring).
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists