[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <175314170043.2234665.2076268504915475261@noble.neil.brown.name>
Date: Tue, 22 Jul 2025 09:48:20 +1000
From: "NeilBrown" <neil@...wn.name>
To: "Amir Goldstein" <amir73il@...il.com>
Cc: "Linus Torvalds" <torvalds@...ux-foundation.org>,
"Alexander Viro" <viro@...iv.linux.org.uk>,
"Christian Brauner" <brauner@...nel.org>, "Jan Kara" <jack@...e.cz>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/7 RFC] New APIs for name lookup and lock for directory
operations
On Mon, 21 Jul 2025, Amir Goldstein wrote:
> On Mon, Jul 21, 2025 at 10:46 AM NeilBrown <neil@...wn.name> wrote:
> >
> > Hi,
> >
> > these patches (against vfs.all) primarily introduce new APIs for
> > preparing dentries for create, remove, rename. The goal is to
> > centralise knowledge of how we do locking (currently by locking the
> > directory) so that we can eventually change the mechanism (e.g. to
> > locking just the dentry).
> >
> > Naming is difficult and I've changed my mind several times. :-)
>
> Indeed it is.
> I generally like the done_ approach that you took.
> Few minor naming comments follow.
>
> >
> > The basic approach is to return a dentry which can be passed to
> > vfs_create(), vfs_unlink() etc, and subsequently to release that
> > dentry. The closest analogue to this in the VFS is kern_path_create()
> > which is paired with done_path_create(), though there is also
> > kern_path_locked() which is paired with explicit inode_unlock() and
> > dput(). So my current approach uses "done_" for finishing up.
> >
> > I have:
> > dentry_lookup() dentry_lookup_noperm() dentry_lookup_hashed()
>
> As I wrote on the patch that introduces them I find dentry_lookup_hashed()
> confusing because the dentry is not hashed (only the hash is calculated).
>
> Looking at another precedent of _noperm() vfs API we have:
>
> vfs_setxattr()
> __vfs_setxattr_locked()
> __vfs_setxattr_noperm()
> __vfs_setxattr()
>
> Do I'd say for lack of better naming __dentry_lookup() could makes sense
> for the bare lock&dget and it could also be introduced earlier along with
> introducing done_dentry_lookup()
>
> > dentry_lookup_killable()
> > paired with
> > done_dentry_lookup()
> >
> > and also
> > rename_lookup() rename_lookup_noperm() rename_lookup_hashed()
> > paired with
> > done_rename_lookup()
> > (these take a "struct renamedata *" to which some qstrs are added.
> >
> > There is also "dentry_lock_in()" which is used instead of
> > dentry_lookup() when you already have the dentry and want to lock it.
> > So you "lock" it "in" a given parent. I'm not very proud of this name,
> > but I don't want to use "dentry_lock" as I want to save that for
> > low-level locking primitives.
>
> Very strange name :)
I wanted to encourage people to comment !!
>
> What's wrong with dentry_lock_parent()?
That sounds like we are locking the parent, but we are conceptually locking
the dentry (Though at first we do that by locking the whole directory).
>
> Although I think that using the verb _lock_ for locking and dget is
> actively confusing, so something along the lines of
> resume_dentry_lookup()/dentry_lookup_reacquire() might serve the
> readers of the code better.
Hmmm.... there is certainly potential there. dentry_lookup_continue()
???
>
> >
> > There is also done_dentry_lookup_return() which doesn't dput() the
> > dentry but returns it instread. In about 1/6 of places where I need
> > done_dentry_lookup() the code makes use of the dentry afterwards. Only
> > in half the places where done_dentry_lookup_return() is used is the
> > returned value immediately returned by the calling function. I could
> > do a dget() before done_dentry_lookup(), but that looks awkward and I
> > think having the _return version is justified. I'm happy to hear other
> > opinions.
>
> The name is not very descriptive IMO, but I do not have a better suggestion.
> Unless you can describe it for the purpose that it is used for, e.g.
> yeild_dentry_lookup() that can be followed with resume_dentry_lookup(),
> but I do not know if those are your intentions for the return API.
The intention is that a few places (maybe a dozen) need to keep a
reference to the dentry after the operation completes. Typically this
is a mkdir or a create. The caller might then want to create another
file in the directory so it holds on to the dentry.
A fairly common case is that virtual filesystems hold an extra reference
to everything they create so the dcache becomes the primary store.
I suspect that in the latter case it is better to have an explicit
dget() with a comment saying "preserve in cache until explicitly
removed" or similar.
For the former case it might be better to introduce a new dentry
variable.
dentry = dentry_lookup(....);
dentry = vfs_mkdir(dir, dentry);
if (!IS_ERR(dentry))
returnval = dget(dentry);
done_dentry_lookup(dentry)
return returnval;
Maybe I should enhance dget() to pass through errors as well as NULL.
Maybe the above would look even better as
struct dentry *dentry __free(lookup) = dentry_lookup(.....);
/* handle error */
dentry = vfs_mkdir(dir, dentry);
return dget(dentry);
So I'll try dropping the _return version, enhancing dget, and adding a
DEFINE_FREE.
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
> >
> > In order for this dentry-focussed API to work we need to have the
> > dentry to unlock. vfs_rmdir() currently consumes the dentry on
> > failure, so we don't have it unless we clumsily keep a copy. So an
> > early patch changes vfs_rmdir() to both consume the dentry and drop the
> > lock on failure.
> >
> > After these new APIs are refined, agreed, and applied I will have a
> > collection of patches to roll them out throughout the kernel. Then we
> > can start/continue discussing a new approach to locking which allows
> > directory operations to proceed in parallel.
> >
> > If you want a sneak peek at some of this future work - for context
> > mostly - my current devel code is at https://github.com/neilbrown/linux.git
> > in a branch "pdirops". Be warned that a lot of the later code is under
> > development, is known to be wrong, and doesn't even compile. Not today
> > anyway. The rolling out of the new APIs is fairly mature though.
> >
> > Please review and suggest better names, or tell me that my choices are adequate.
> > And find the bugs in the code too :-)
> >
> > I haven't cc:ed the maintains of the non-VFS code that the patches
> > touch. I can do that once the approach and names have been approved.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
> > [PATCH 2/7] VFS: introduce done_dentry_lookup()
> > [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
> > [PATCH 4/7] VFS: introduce dentry_lookup() and friends
> > [PATCH 5/7] VFS: add dentry_lookup_killable()
> > [PATCH 6/7] VFS: add rename_lookup()
> > [PATCH 7/7] VFS: introduce dentry_lock_in()
> >
>
Powered by blists - more mailing lists