[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240902124642.rnd763njngu6qsg2@quack3>
Date: Mon, 2 Sep 2024 14:46:42 +0200
From: Jan Kara <jack@...e.cz>
To: Aleksa Sarai <cyphar@...har.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Chuck Lever <chuck.lever@...cle.com>,
Jeff Layton <jlayton@...nel.org>,
Amir Goldstein <amir73il@...il.com>,
Alexander Aring <alex.aring@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Christoph Hellwig <hch@...radead.org>,
Josef Bacik <josef@...icpanda.com>, linux-fsdevel@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-api@...r.kernel.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH RESEND v3 1/2] uapi: explain how per-syscall AT_* flags
should be allocated
On Wed 28-08-24 20:19:42, Aleksa Sarai wrote:
> Unfortunately, the way we have gone about adding new AT_* flags has
> been a little messy. In the beginning, all of the AT_* flags had generic
> meanings and so it made sense to share the flag bits indiscriminately.
> However, we inevitably ran into syscalls that needed their own
> syscall-specific flags. Due to the lack of a planned out policy, we
> ended up with the following situations:
>
> * Existing syscalls adding new features tended to use new AT_* bits,
> with some effort taken to try to re-use bits for flags that were so
> obviously syscall specific that they only make sense for a single
> syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).
>
> Given the constraints of bitflags, this works well in practice, but
> ideally (to avoid future confusion) we would plan ahead and define a
> set of "per-syscall bits" ahead of time so that when allocating new
> bits we don't end up with a complete mish-mash of which bits are
> supposed to be per-syscall and which aren't.
>
> * New syscalls dealt with this in several ways:
>
> - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
> fspick(2)) created their separate own flag spaces that have no
> overlap with the AT_* flags. Most of these ended up allocating
> their bits sequentually.
>
> In the case of move_mount(2) and fspick(2), several flags have
> identical meanings to AT_* flags but were allocated in their own
> flag space.
>
> This makes sense for syscalls that will never share AT_* flags, but
> for some syscalls this leads to duplication with AT_* flags in a
> way that could cause confusion (if renameat2(2) grew a
> RENAME_EMPTY_PATH it seems likely that users could mistake it for
> AT_EMPTY_PATH since it is an *at(2) syscall).
>
> - Some syscalls unfortunately ended up both creating their own flag
> space while also using bits from other flag spaces. The most
> obvious example is open_tree(2), where the standard usage ends up
> using flags from *THREE* separate flag spaces:
>
> open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);
>
> (Note that O_CLOEXEC is also platform-specific, so several future
> OPEN_TREE_* bits are also made unusable in one fell swoop.)
>
> It's not entirely clear to me what the "right" choice is for new
> syscalls. Just saying that all future VFS syscalls should use AT_* flags
> doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
> don't make much sense to burn generic AT_* flags for) and move_mount(2)
> has separate AT_*-like flags for both the source and target so separate
> flags are needed anyway (though it seems possible that renameat2(2)
> could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
> they can't be reused).
>
> But at least for syscalls that _do_ choose to use AT_* flags, we should
> explicitly state the policy that 0x2ff is currently intended for
> per-syscall flags and that new flags should err on the side of
> overlapping with existing flag bits (so we can extend the scope of
> generic flags in the future if necessary).
>
> And add AT_* aliases for the RENAME_* flags to further cement that
> renameat2(2) is an *at(2) flag, just with its own per-syscall flags.
>
> Suggested-by: Amir Goldstein <amir73il@...il.com>
> Reviewed-by: Jeff Layton <jlayton@...nel.org>
> Reviewed-by: Josef Bacik <josef@...icpanda.com>
> Signed-off-by: Aleksa Sarai <cyphar@...har.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
> ---
> include/uapi/linux/fcntl.h | 80 ++++++++++++++-------
> tools/perf/trace/beauty/include/uapi/linux/fcntl.h | 83 +++++++++++++++-------
> 2 files changed, 115 insertions(+), 48 deletions(-)
>
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index e55a3314bcb0..38a6d66d9e88 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -90,37 +90,69 @@
> #define DN_ATTRIB 0x00000020 /* File changed attibutes */
> #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
>
> +#define AT_FDCWD -100 /* Special value for dirfd used to
> + indicate openat should use the
> + current working directory. */
> +
> +
> +/* Generic flags for the *at(2) family of syscalls. */
> +
> +/* Reserved for per-syscall flags 0xff. */
> +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic
> + links. */
> +/* Reserved for per-syscall flags 0x200 */
> +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount
> + traversal. */
> +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative
> + pathname to operate on dirfd
> + directly. */
> /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> - * unlinkat. The two functions do completely different things and therefore,
> - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
> - * faccessat would be undefined behavior and thus treating it equivalent to
> - * AT_EACCESS is valid undefined behavior.
> + * These flags are currently statx(2)-specific, but they could be made generic
> + * in the future and so they should not be used for other per-syscall flags.
> */
> -#define AT_FDCWD -100 /* Special value used to indicate
> - openat should use the current
> - working directory. */
> -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> +
> +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +
> +/*
> + * Per-syscall flags for the *at(2) family of syscalls.
> + *
> + * These are flags that are so syscall-specific that a user passing these flags
> + * to the wrong syscall is so "clearly wrong" that we can safely call such
> + * usage "undefined behaviour".
> + *
> + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value.
> + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful
> + * only to unlinkat. The two functions do completely different things and
> + * therefore, the flags can be allowed to overlap. For example, passing
> + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it
> + * equivalent to AT_EACCESS is valid undefined behavior.
> + *
> + * Note for implementers: When picking a new per-syscall AT_* flag, try to
> + * reuse already existing flags first. This leaves us with as many unused bits
> + * as possible, so we can use them for generic bits in the future if necessary.
> + */
> +
> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +#define AT_RENAME_NOREPLACE 0x0001
> +#define AT_RENAME_EXCHANGE 0x0002
> +#define AT_RENAME_WHITEOUT 0x0004
> +
> +/* Flag for faccessat(2). */
> #define AT_EACCESS 0x200 /* Test access permitted for
> effective IDs, not real IDs. */
> +/* Flag for unlinkat(2). */
> #define AT_REMOVEDIR 0x200 /* Remove directory instead of
> unlinking file. */
> -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
> -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> -
> -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> -
> -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +/* Flags for name_to_handle_at(2). */
> +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare
> + object identity and may not be
> + usable with open_by_handle_at(2). */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
> - compare object identity and may not
> - be usable to open_by_handle_at(2) */
> #if defined(__KERNEL__)
> #define AT_GETATTR_NOSEC 0x80000000
> #endif
> diff --git a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> index c0bcc185fa48..38a6d66d9e88 100644
> --- a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> +++ b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> @@ -16,6 +16,9 @@
>
> #define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3)
>
> +/* Was the file just created? */
> +#define F_CREATED_QUERY (F_LINUX_SPECIFIC_BASE + 4)
> +
> /*
> * Cancel a blocking posix lock; internal use only until we expose an
> * asynchronous lock api to userspace:
> @@ -87,37 +90,69 @@
> #define DN_ATTRIB 0x00000020 /* File changed attibutes */
> #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
>
> +#define AT_FDCWD -100 /* Special value for dirfd used to
> + indicate openat should use the
> + current working directory. */
> +
> +
> +/* Generic flags for the *at(2) family of syscalls. */
> +
> +/* Reserved for per-syscall flags 0xff. */
> +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic
> + links. */
> +/* Reserved for per-syscall flags 0x200 */
> +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount
> + traversal. */
> +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative
> + pathname to operate on dirfd
> + directly. */
> +/*
> + * These flags are currently statx(2)-specific, but they could be made generic
> + * in the future and so they should not be used for other per-syscall flags.
> + */
> +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> +
> +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +
> /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> - * unlinkat. The two functions do completely different things and therefore,
> - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
> - * faccessat would be undefined behavior and thus treating it equivalent to
> - * AT_EACCESS is valid undefined behavior.
> + * Per-syscall flags for the *at(2) family of syscalls.
> + *
> + * These are flags that are so syscall-specific that a user passing these flags
> + * to the wrong syscall is so "clearly wrong" that we can safely call such
> + * usage "undefined behaviour".
> + *
> + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value.
> + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful
> + * only to unlinkat. The two functions do completely different things and
> + * therefore, the flags can be allowed to overlap. For example, passing
> + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it
> + * equivalent to AT_EACCESS is valid undefined behavior.
> + *
> + * Note for implementers: When picking a new per-syscall AT_* flag, try to
> + * reuse already existing flags first. This leaves us with as many unused bits
> + * as possible, so we can use them for generic bits in the future if necessary.
> */
> -#define AT_FDCWD -100 /* Special value used to indicate
> - openat should use the current
> - working directory. */
> -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> +
> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +#define AT_RENAME_NOREPLACE 0x0001
> +#define AT_RENAME_EXCHANGE 0x0002
> +#define AT_RENAME_WHITEOUT 0x0004
> +
> +/* Flag for faccessat(2). */
> #define AT_EACCESS 0x200 /* Test access permitted for
> effective IDs, not real IDs. */
> +/* Flag for unlinkat(2). */
> #define AT_REMOVEDIR 0x200 /* Remove directory instead of
> unlinking file. */
> -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
> -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> -
> -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> -
> -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +/* Flags for name_to_handle_at(2). */
> +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare
> + object identity and may not be
> + usable with open_by_handle_at(2). */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
> - compare object identity and may not
> - be usable to open_by_handle_at(2) */
> #if defined(__KERNEL__)
> #define AT_GETATTR_NOSEC 0x80000000
> #endif
>
> --
> 2.46.0
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists