[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <twielfhtttexj7gnhzkfm4eygx4avot4xaw63nv4winm6rjfqz@674juh7cylku>
Date: Tue, 18 Feb 2025 13:20:57 +0100
From: Jan Kara <jack@...e.cz>
To: NeilBrown <neilb@...e.de>
Cc: Christian Brauner <brauner@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] Change inode_operations.mkdir to return struct
dentry *
On Mon 17-02-25 16:30:03, NeilBrown wrote:
> Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
> complete control of sequencing on the actual filesystem (e.g. on a
> different server) and may find that the inode created for a mkdir
> request already exists in the icache and dcache by the time the mkdir
> request returns. For example, if the filesystem is mounted twice the
> file could be visible on the other mount before it is on the original
> mount, and a pair of name_to_handle_at(), open_by_handle_at() could
> instantiate the directory inode with an IS_ROOT() dentry before the
> first mkdir returns.
>
> This means that the dentry passed to ->mkdir() may not be the one that
> is associated with the inode after the ->mkdir() completes. Some
> callers need to interact with the inode after the ->mkdir completes and
> they currently need to perform a lookup in the (rare) case that the
> dentry is no longer hashed.
>
> This lookup-after-mkdir requires that the directory remains locked to
> avoid races. Planned future patches to lock the dentry rather than the
> directory will mean that this lookup cannot be performed atomically with
> the mkdir.
>
> To remove this barrier, this patch changes ->mkdir to return the
> resulting dentry if it is different from the one passed in.
> Possible returns are:
> NULL - the directory was created and no other dentry was used
> ERR_PTR() - an error occurred
> non-NULL - this other dentry was spliced in
>
> Not all filesystems reliable result in a positive hashed dentry:
>
> - NFS does produce the proper dentry, but does not yet return it. The
> code change is larger than I wanted to include in this patch
> - cifs will, when posix extensions are enabled, unhash the
> dentry on success so a subsequent lookup will create it if needed.
> - cifs without posix extensions will unhash the dentry if an
> internal lookup finds a non-directory where it expected the dir.
> - kernfs and tracefs leave the dentry negative and the ->revalidate
> operation ensures that lookup will be called to correctly populate
> the dentry
> - hostfs leaves the dentry negative and uses
> .d_delete = always_delete_dentry
> so the negative dentry is quickly discarded and a lookup will add a
> new entry.
>
> Signed-off-by: NeilBrown <neilb@...e.de>
Looks good to me. I've spotted just a few style nits:
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 31eea688609a..bd3751dfddcf 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -562,7 +562,10 @@ otherwise noted.
> ``mkdir``
> called by the mkdir(2) system call. Only required if you want
> to support creating subdirectories. You will probably need to
> - call d_instantiate() just as you would in the create() method
> + call d_instantiate() just as you would in the create() method.
> + If some dentry other than the one given is spliced in, then it
> + much be returned, otherwise NULL is returned is the original dentry
^^^^ must
> + is successfully used, else an ERR_PTR() is returned.
>
> ``rmdir``
> called by the rmdir(2) system call. Only required if you want
...
> @@ -918,6 +922,7 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> args.in_args[1].size = entry->d_name.len + 1;
> args.in_args[1].value = entry->d_name.name;
> return create_new_entry(idmap, fm, &args, dir, entry, S_IFDIR);
> +
Bogus empty line here.
> }
>
> static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
...
> @@ -4313,10 +4314,17 @@ int vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> if (max_links && dir->i_nlink >= max_links)
> return -EMLINK;
>
> - error = dir->i_op->mkdir(idmap, dir, dentry, mode);
> - if (!error)
> + de = dir->i_op->mkdir(idmap, dir, dentry, mode);
> + if (IS_ERR(de))
> + return PTR_ERR(de);
> + if (de) {
> + fsnotify_mkdir(dir, de);
> + /* Cannot return de yet */
> + dput(de);
> + } else
> fsnotify_mkdir(dir, dentry);
The prefered style is:
} else {
fsnotify_mkdir(dir, dentry);
}
Otherwise the patch looks good to me at least for the VFS bits and the
filesystems I understand like ext2, ext4, ocfs2, udf. So feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists