[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOQ4uxiWE5eVTrL-2EWVHGQEpEX7HSstj_+kEp-b7xZrnfoXMA@mail.gmail.com>
Date: Fri, 6 Feb 2026 14:35:25 +0100
From: Amir Goldstein <amir73il@...il.com>
To: NeilBrown <neil@...wn.name>
Cc: Christian Brauner <brauner@...nel.org>, Alexander Viro <viro@...iv.linux.org.uk>,
David Howells <dhowells@...hat.com>, Jan Kara <jack@...e.cz>, Chuck Lever <chuck.lever@...cle.com>,
Jeff Layton <jlayton@...nel.org>, Miklos Szeredi <miklos@...redi.hu>,
John Johansen <john.johansen@...onical.com>, Paul Moore <paul@...l-moore.com>,
James Morris <jmorris@...ei.org>, "Serge E. Hallyn" <serge@...lyn.com>,
Stephen Smalley <stephen.smalley.work@...il.com>, linux-kernel@...r.kernel.org,
netfs@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-unionfs@...r.kernel.org,
apparmor@...ts.ubuntu.com, linux-security-module@...r.kernel.org,
selinux@...r.kernel.org
Subject: Re: [PATCH 10/13] ovl: change ovl_create_real() to get a new lock
when re-opening created file.
On Fri, Feb 6, 2026 at 2:11 AM NeilBrown <neilb@...mail.net> wrote:
>
> On Thu, 05 Feb 2026, Amir Goldstein wrote:
> > On Wed, Feb 4, 2026 at 6:09 AM NeilBrown <neilb@...mail.net> wrote:
> > >
> > > From: NeilBrown <neil@...wn.name>
> > >
> > > When ovl_create_real() is used to create a file on the upper filesystem
> > > it needs to return the resulting dentry - positive and hashed.
> > > It is usually the case the that dentry passed to the create function
> > > (e.g. vfs_create()) will be suitable but this is not guaranteed. The
> > > filesystem may unhash that dentry forcing a repeat lookup next time the
> > > name is wanted.
> > >
> > > So ovl_create_real() must be (and is) aware of this and prepared to
> > > perform that lookup to get a hash positive dentry.
> > >
> > > This is currently done under that same directory lock that provided
> > > exclusion for the create. Proposed changes to locking will make this
> > > not possible - as the name, rather than the directory, will be locked.
> > > The new APIs provided for lookup and locking do not and cannot support
> > > this pattern.
> > >
> > > The lock isn't needed. ovl_create_real() can drop the lock and then get
> > > a new lock for the lookup - then check that the lookup returned the
> > > correct inode. In a well-behaved configuration where the upper
> > > filesystem is not being modified by a third party, this will always work
> > > reliably, and if there are separate modification it will fail cleanly.
> > >
> > > So change ovl_create_real() to drop the lock and call
> > > ovl_start_creating_upper() to find the correct dentry. Note that
> > > start_creating doesn't fail if the name already exists.
> > >
> > > This removes the only remaining use of ovl_lookup_upper, so it is
> > > removed.
> > >
> > > Signed-off-by: NeilBrown <neil@...wn.name>
> > > ---
> > > fs/overlayfs/dir.c | 24 ++++++++++++++++++------
> > > fs/overlayfs/overlayfs.h | 7 -------
> > > 2 files changed, 18 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > > index ff3dbd1ca61f..ec08904d084d 100644
> > > --- a/fs/overlayfs/dir.c
> > > +++ b/fs/overlayfs/dir.c
> > > @@ -219,21 +219,33 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
> > > err = -EIO;
> > > } else if (d_unhashed(newdentry)) {
> > > struct dentry *d;
> > > + struct name_snapshot name;
> > > /*
> > > * Some filesystems (i.e. casefolded) may return an unhashed
> > > - * negative dentry from the ovl_lookup_upper() call before
> > > + * negative dentry from the ovl_start_creating_upper() call before
> > > * ovl_create_real().
> >
> >
> > According to the new locking rules, if the hashed dentry itself is
> > the synchronization object, is it going to be allowed to
> > filesystem to unhash the dentry while the dentry still in the
> > "creating" scope? It is hard for me to wrap my head around this.
>
> It can be confusing....
>
> It will be important for the name the remain locked (and hashed) until
> the operation (create, remove, rename) either succeeds or fails. So
> leaving a dentry unhashed will be OK providing a subsequent lookup will
> also succeed or fail in the same way. The caller must be able to use
> the dentry to access the object (i.e. the inode) on success, but they
> is nothing in POSIX that requires that the object still has any
> particular name.
>
> >
> > Or do we need this here because some filesystems (casefold in
> > particular) are not going to support parallel creations?
>
> There is no reason that a casefolding filesystem would not support parallel
> ops. And it isn't just casefolding that acts like this. At least one of
> the special filesystems (tracefs maybe) always unhashes on create. You
> only ever get a hashed positive dentry as a result of lookup.
> (overlayfs would never see this case of course).
>
> >
> > > * In that case, lookup again after making the newdentry
> > > * positive, so ovl_create_upper() always returns a hashed
> > > * positive dentry.
> > > + * As we have to drop the lock before the lookup a race
> > > + * could result in a lookup failure. In that case we return
> > > + * an error.
> > > */
> > > - d = ovl_lookup_upper(ofs, newdentry->d_name.name, parent,
> > > - newdentry->d_name.len);
> > > - dput(newdentry);
> > > - if (IS_ERR_OR_NULL(d))
> > > + take_dentry_name_snapshot(&name, newdentry);
> > > + end_creating_keep(newdentry);
> > > + d = ovl_start_creating_upper(ofs, parent, &name.name);
> > > + release_dentry_name_snapshot(&name);
> >
> > OK. not saying no to this (yet) but I have to admit that it is pretty
> > ugly that the callers of ovl_create_real() want to create a specific
> > stable name, which is could be passed in as const char *name
> > and yet we end up doing this weird dance here just to keep the name
> > from newdentry.
>
> There are three callers of ovl_create_real()
>
> ovl_lookup_or_create() does have a "const char *name".
> ovl_create_upper() has a stable dentry from which it can copy a QSTR
> ovl_create_temp() would need some sort of dance to keep hold of the
> temporary name that was allocated.
>
> If it weren't for ovl_create_temp() I would agree with you.
>
> Though we could have the three callers of ovl_start_creating_temp() pass a
> "char name[OVL_TEMPNAME_SIZE]" in, then ovl_create_temp() would have
> easy access.
> I could do that if you like.
Yes, considering that two of the callers are from the same function
(ovl_whiteout()) I think that would end up looking nicer.
Thanks,
Amir.
Powered by blists - more mailing lists