[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhp0_HSre2LLStPVVsEJ3MqYDs1Ak9UAvB=o8Z7sVB=Mg@mail.gmail.com>
Date: Fri, 24 May 2024 07:58:18 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Aleksa Sarai <cyphar@...har.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
Alexander Aring <alex.aring@...il.com>, linux-fsdevel@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-api@...r.kernel.org
Subject: Re: [PATCH RFC v2] fhandle: expose u64 mount id to name_to_handle_at(2)
On Thu, May 23, 2024 at 11:57 PM Aleksa Sarai <cyphar@...har.com> wrote:
>
> Now that we provide a unique 64-bit mount ID interface in statx, we can
> now provide a race-free way for name_to_handle_at(2) to provide a file
> handle and corresponding mount without needing to worry about racing
> with /proc/mountinfo parsing.
>
> While this is not necessary if you are using AT_EMPTY_PATH and don't
> care about an extra statx(2) call, users that pass full paths into
> name_to_handle_at(2) need to know which mount the file handle comes from
> (to make sure they don't try to open_by_handle_at a file handle from a
> different filesystem) and switching to AT_EMPTY_PATH would require
> allocating a file for every name_to_handle_at(2) call, turning
>
> err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
> AT_HANDLE_MNT_ID_UNIQUE);
>
> into
>
> int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
> err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
> err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
> mntid = statxbuf.stx_mnt_id;
> close(fd);
>
> Unlike AT_HANDLE_FID, as per Amir's suggestion, AT_HANDLE_MNT_ID_UNIQUE
> uses a new AT_* bit from the historically-unused 0xFF range (which we
> now define as being the "per-syscall" range for AT_* bits).
>
Sorry for nit picking, but I think that "Unlike AT_HANDLE_FID,..." is confusing
in this statement.
AT_HANDLE_FID is using a bit that was already effectively allocated for a
"per-syscall" range.
I don't think that mentioning AT_HANDLE_FID adds any clarity to the statement
so better drop it?
> Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> ---
> Changes in v2:
> - Fixed a few minor compiler warnings and a buggy copy_to_user() check.
> - Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx.
> - Switched to using an AT_* bit from 0xFF and defining that range as
> being "per-syscall" for future usage.
> - Sync tools/ copy of <linux/fcntl.h> to include changes.
> - v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@cyphar.com>
> ---
> fs/fhandle.c | 29 ++++++++++++++++++++++-------
> include/linux/syscalls.h | 2 +-
> include/uapi/linux/fcntl.h | 28 +++++++++++++++++++++-------
> tools/include/uapi/linux/fcntl.h | 28 +++++++++++++++++++++-------
> 4 files changed, 65 insertions(+), 22 deletions(-)
>
[...]
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..9ed9d65842c1 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -87,22 +87,24 @@
> #define DN_ATTRIB 0x00000020 /* File changed attibutes */
> #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
>
> +#define AT_FDCWD -100 /* Special value used to indicate
> + openat should use the current
> + working directory. */
(more nit picking)
If you are changing this line, please at least add a new line,
this is a different namespace :-/
and perhaps change it to "Special value of dirfd argument..."
Also, better leave a comment here to discourage allocation from this range:
+ /* Reserved for per-syscall flags 0xff */
> +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> +
> /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> + * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS
> + * is meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> * unlinkat. The two functions do completely different things and therefore,
> * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
> * faccessat would be undefined behavior and thus treating it equivalent to
> * AT_EACCESS is valid undefined behavior.
> */
If you are going to add this churn in this patch, please do it otherwise.
It does not make sense to have this long explanation about pre-syscall
AT_* flags in a different location from the comment you added about
"All new purely-syscall-specific AT_* flags.."
if this explanation is needed at all, it should be after the new comment
as an example.
> -#define AT_FDCWD -100 /* Special value used to indicate
> - openat should use the current
> - working directory. */
> -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> #define AT_EACCESS 0x200 /* Test access permitted for
> effective IDs, not real IDs. */
> #define AT_REMOVEDIR 0x200 /* Remove directory instead of
> unlinking file. */
I really prefer to move those to the per-syscall section
right next to AT_HANDLE_FID and leave a comment here:
/* Reserved for per-syscall flags 0x200 */
> +
> #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
> #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> @@ -114,10 +116,22 @@
>
> #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits.. */
> +/*
> + * All new purely-syscall-specific AT_* flags should consider using bits from
> + * 0xFF, but the bits used by RENAME_* (0x7) should be avoided in case users
> + * decide to pass AT_* flags to renameat2() by accident.
Sorry, but I find the use of my renameat2() example a bit confusing
in this sentence.
If you mention it at all, please use "For example, the bits used by..."
but I think it is more important to say "...should consider re-using bits
already used by other per-syscalls flags".
> These flag bits are
> + * free for re-use by other syscall's syscall-specific flags without worry.
> + */
> +
> +/*
> + * Flags for name_to_handle_at(2). To save AT_ flag space we re-use the
> + * AT_EACCESS/AT_REMOVEDIR bit for AT_HANDLE_FID.
> + */
AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID have exact same status,
so instead of this asymmetric comment:
+/* Flags for faccessat(2) */
+#define AT_EACCESS 0x200 /* Test access permitted for
+ effective IDs, not real IDs. */
+/* Flags for unlinkat(2) */
+#define AT_REMOVEDIR 0x200 /* Remove directory instead of
+ unlinking file. */
+/* Flags for name_to_handle_at(2) */
+#define AT_HANDLE_FID 0x200 /* file handle is needed to
compare object identity and may not
be usable to open_by_handle_at(2) */
> +#define AT_HANDLE_MNT_ID_UNIQUE 0x80 /* return the u64 unique mount id */
IDGI, I think we may have been miscommunicating :-/
If 0x7 range is to be avoided for generic AT_ flags, then it *should* be used
for new per-syscall flags such as this one.
The reservation of 0xff is not a strong guarantee.
As long as people re-use new per-syscalls flags efficiently, we could
decide to reclaim some of this space for generic AT_ flags in the future
if it is needed.
I know most of the mess was here before your patch, but I think
it got to a point where we must put a little order before introducing
the new per-syscall flag.
Thanks,
Amir.
Powered by blists - more mailing lists