linux-kernel - Re: [PATCH v4 02/10] man/man2/fsopen.2: document "new" mount API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zrifsd6vqj6ve25uipyeteuztncgwtzfmfnfsxhcjwcnxf2wen@xjx3y2g77uin>
Date: Fri, 19 Sep 2025 18:24:09 +0200
From: Alejandro Colomar <alx@...nel.org>
To: Aleksa Sarai <cyphar@...har.com>
Cc: "Michael T. Kerrisk" <mtk.manpages@...il.com>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, Askar Safin <safinaskar@...omail.com>, 
	"G. Branden Robinson" <g.branden.robinson@...il.com>, linux-man@...r.kernel.org, linux-api@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	David Howells <dhowells@...hat.com>, Christian Brauner <brauner@...nel.org>
Subject: Re: [PATCH v4 02/10] man/man2/fsopen.2: document "new" mount API

Hi Aleksa,

On Fri, Sep 19, 2025 at 11:59:43AM +1000, Aleksa Sarai wrote:
> This is loosely based on the original documentation written by David
> Howells and later maintained by Christian Brauner, but has been
> rewritten to be more from a user perspective (as well as fixing a few
> critical mistakes).
> 
> Co-authored-by: David Howells <dhowells@...hat.com>
> Signed-off-by: David Howells <dhowells@...hat.com>
> Co-authored-by: Christian Brauner <brauner@...nel.org>
> Signed-off-by: Christian Brauner <brauner@...nel.org>
> Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> ---
>  man/man2/fsopen.2 | 384 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 384 insertions(+)
> 
> diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> new file mode 100644
> index 0000000000000000000000000000000000000000..7cdbeac7d64b7e5c969dee619a039ec947d1e981
> --- /dev/null
> +++ b/man/man2/fsopen.2
> @@ -0,0 +1,384 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +fsopen \- create a new filesystem context
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/mount.h>
> +.P
> +.BI "int fsopen(const char *" fsname ", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR fsopen ()
> +system call is part of
> +the suite of file descriptor based mount facilities in Linux.

Minor nitpick (I can amend that; no worries):

Because 'file-descriptor-based' works as a single modifier of
facilities, it goes with hyphens.

> +.P
> +.BR fsopen ()
> +creates a blank filesystem configuration context within the kernel
> +for the filesystem named by
> +.I fsname
> +and places it into creation mode.
> +A new file descriptor
> +associated with the filesystem configuration context
> +is then returned.
> +The calling process must have the
> +.B \%CAP_SYS_ADMIN
> +capability in order to create a new filesystem configuration context.
> +.P
> +A filesystem configuration context is
> +an in-kernel representation of a pending transaction,
> +containing a set of configuration parameters that are to be applied
> +when creating a new instance of a filesystem
> +(or modifying the configuration of an existing filesystem instance,
> +such as when using
> +.BR fspick (2)).
> +.P
> +After obtaining a filesystem configuration context with
> +.BR fsopen (),
> +the general workflow for operating on the context looks like the following:
> +.IP (1) 5
> +Pass the filesystem context file descriptor to
> +.BR fsconfig (2)
> +to specify any desired filesystem parameters.
> +This may be done as many times as necessary.
> +.IP (2)
> +Pass the same filesystem context file descriptor to
> +.BR fsconfig (2)
> +with
> +.B \%FSCONFIG_CMD_CREATE
> +to create an instance of the configured filesystem.
> +.IP (3)
> +Pass the same filesystem context file descriptor to
> +.BR fsmount (2)
> +to create a new detached mount object for
> +the root of the filesystem instance,
> +which is then attached to a new file descriptor.
> +(This also places the filesystem context file descriptor into
> +reconfiguration mode,
> +similar to the mode produced by
> +.BR fspick (2).)
> +Once a mount object has been created with
> +.BR fsmount (2),
> +the filesystem context file descriptor can be safely closed.
> +.IP (4)
> +Now that a mount object has been created,
> +you may
> +.RS
> +.IP (4.1) 7
> +use the detached mount object file descriptor as a
> +.I dirfd
> +argument to "*at()" system calls; and/or
> +.IP (4.2) 7

I'll paste here the formatted part of this page:

        (4)  Now that a mount object has been created, you may
 
             (4.1)  use the detached mount object file descrip‐
                    tor as a dirfd argument to "*at()" system
                    calls; and/or
 
             (4.2)  attach the mount object to a mount point by
                    passing the mount object file descriptor to
                    move_mount(2).  This will also prevent the
                    mount object from being unmounted and de‐
                    stroyed when the mount object file descrip‐
                    tor is closed.

             The mount object file descriptor will remain asso‐
             ciated with the mount object even after doing the
             above operations, so you may repeatedly use the
             mount object file descriptor with move_mount(2)
             and/or "*at()" system calls as many times as neces‐
             sary.

That sublist seems to be an unordered one.  I think we should use
a bullet list for those items (the outer list 1,2,3,4 is okay as is).

       Bullet lists
              Elements are preceded by bullet symbols  (\[bu]).
              Anything  that  doesn't  fit elsewhere is usually
              covered by this type of list.

> +attach the mount object to a mount point
> +by passing the mount object file descriptor to
> +.BR move_mount (2).
> +This will also prevent the mount object from
> +being unmounted and destroyed when
> +the mount object file descriptor is closed.
> +.RE
> +.IP
> +The mount object file descriptor will
> +remain associated with the mount object
> +even after doing the above operations,
> +so you may repeatedly use the mount object file descriptor with
> +.BR move_mount (2)
> +and/or "*at()" system calls
> +as many times as necessary.
> +.P
> +A filesystem context will move between different modes
> +throughout its lifecycle
> +(such as the creation phase
> +when created with
> +.BR fsopen (),
> +the reconfiguration phase
> +when an existing filesystem instance is selected with
> +.BR fspick (2),
> +and the intermediate "awaiting-mount" phase
> +.\" FS_CONTEXT_AWAITING_MOUNT is the term the kernel uses for this.
> +between
> +.BR \%FSCONFIG_CMD_CREATE
> +and
> +.BR fsmount (2)),
> +which has an impact on
> +what operations are permitted on the filesystem context.
> +.P
> +The file descriptor returned by
> +.BR fsopen ()
> +also acts as a channel for filesystem drivers to
> +provide more comprehensive diagnostic information
> +than is normally provided through the standard
> +.BR errno (3)
> +interface for system calls.
> +If an error occurs at any time during the workflow mentioned above,
> +calling
> +.BR read (2)
> +on the filesystem context file descriptor
> +will retrieve any ancillary information about the encountered errors.
> +(See the "Message retrieval interface" section
> +for more details on the message format.)
> +.P
> +.I flags
> +can be used to control aspects of
> +the creation of the filesystem configuration context file descriptor.
> +A value for
> +.I flags
> +is constructed by bitwise ORing
> +zero or more of the following constants:
> +.RS
> +.TP
> +.B FSOPEN_CLOEXEC
> +Set the close-on-exec
> +.RB ( FD_CLOEXEC )
> +flag on the new file descriptor.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +for reasons why this may be useful.
> +.RE
> +.P
> +A list of filesystems supported by the running kernel
> +(and thus a list of valid values for
> +.IR fsname )
> +can be obtained from
> +.IR /proc/filesystems .
> +(See also
> +.BR proc_filesystems (5).)
> +.SS Message retrieval interface
> +When doing operations on a filesystem configuration context,
> +the filesystem driver may choose to provide
> +ancillary information to userspace
> +in the form of message strings.
> +.P
> +The filesystem context file descriptors returned by
> +.BR fsopen ()
> +and
> +.BR fspick (2)
> +may be queried for message strings at any time by calling
> +.BR read (2)
> +on the file descriptor.
> +Each call to
> +.BR read (2)
> +will return a single message,
> +prefixed to indicate its class:
> +.RS
> +.TP
> +\fBe\fP <\fImessage\fP>

We don't use '<' and '>' for indicating variable parts.  We already use
italics for that.  The reason to avoid the '<' and '>' is that it is
confusing: it is often unclear if the '<' are literal or placeholders.

We only use '<' when they're literal.

I suspect your want

	.BI e\~ message

BTW, I'm assuming there's one space between the letter and the message,
and there are no literal '<'/'>', right?


Have a lovely day!
Alex

> +An error message was logged.
> +This is usually associated with an error being returned
> +from the corresponding system call which triggered this message.
> +.TP
> +\fBw\fP <\fImessage\fP>
> +A warning message was logged.
> +.TP
> +\fBi\fP <\fImessage\fP>
> +An informational message was logged.
> +.RE
> +.P
> +Messages are removed from the queue as they are read.
> +Note that the message queue has limited depth,
> +so it is possible for messages to get lost.
> +If there are no messages in the message queue,
> +.B read(2)
> +will return \-1 and
> +.I errno
> +will be set to
> +.BR \%ENODATA .
> +If the
> +.I buf
> +argument to
> +.BR read (2)
> +is not large enough to contain the entire message,
> +.BR read (2)
> +will return \-1 and
> +.I errno
> +will be set to
> +.BR \%EMSGSIZE .
> +(See BUGS.)
> +.P
> +If there are multiple filesystem contexts
> +referencing the same filesystem instance
> +(such as if you call
> +.BR fspick (2)
> +multiple times for the same mount),
> +each one gets its own independent message queue.
> +This does not apply to multiple file descriptors that are
> +tied to the same underlying open file description
> +(such as those created with
> +.BR dup (2)).
> +.P
> +Message strings will usually be prefixed by
> +the name of the filesystem or kernel subsystem
> +that logged the message,
> +though this may not always be the case.
> +See the Linux kernel source code for details.
> +.SH RETURN VALUE
> +On success, a new file descriptor is returned.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EFAULT
> +.I fsname
> +is NULL
> +or a pointer to a location
> +outside the calling process's accessible address space.
> +.TP
> +.B EINVAL
> +.I flags
> +had an invalid flag set.
> +.TP
> +.B EMFILE
> +The calling process has too many open files to create more.
> +.TP
> +.B ENFILE
> +The system has too many open files to create more.
> +.TP
> +.B ENODEV
> +The filesystem named by
> +.I fsname
> +is not supported by the kernel.
> +.TP
> +.B ENOMEM
> +The kernel could not allocate sufficient memory to complete the operation.
> +.TP
> +.B EPERM
> +The calling process does not have the required
> +.B \%CAP_SYS_ADMIN
> +capability.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 5.2.
> +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> +.\" commit 400913252d09f9cfb8cce33daee43167921fc343
> +glibc 2.36.
> +.SH BUGS
> +.SS Message retrieval interface and \fB\%EMSGSIZE\fP
> +As described in the "Message retrieval interface" subsection above,
> +calling
> +.BR read (2)
> +with too small a buffer to contain
> +the next pending message in the message queue
> +for the filesystem configuration context
> +will cause
> +.BR read (2)
> +to return \-1 and set
> +.BR errno (3)
> +to
> +.BR \%EMSGSIZE .
> +.P
> +However,
> +this failed operation still
> +consumes the message from the message queue.
> +This effectively discards the message silently,
> +as no data is copied into the
> +.BR read (2)
> +buffer.
> +.P
> +Programs should take care to ensure that
> +their buffers are sufficiently large
> +to contain any reasonable message string,
> +in order to avoid silently losing valuable diagnostic information.
> +.\" Aleksa Sarai
> +.\"   This unfortunate behaviour has existed since this feature was merged, but
> +.\"   I have sent a patchset which will finally fix it.
> +.\"   <https://lore.kernel.org/r/20250807-fscontext-log-cleanups-v3-1-8d91d6242dc3@cyphar.com/>
> +.SH EXAMPLES
> +To illustrate the workflow for creating a new mount,
> +the following is an example of how to mount an
> +.BR ext4 (5)
> +filesystem stored on
> +.I /dev/sdb1
> +onto
> +.IR /mnt .
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd;
> +\&
> +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +First,
> +an ext4 configuration context is created and attached to the file descriptor
> +.IR fsfd .
> +Then, a series of parameters
> +(such as the source of the filesystem)
> +are provided using
> +.BR fsconfig (2),
> +followed by the filesystem instance being created with
> +.BR \%FSCONFIG_CMD_CREATE .
> +.BR fsmount (2)
> +is then used to create a new mount object attached to the file descriptor
> +.IR mntfd ,
> +which is then attached to the intended mount point using
> +.BR move_mount (2).
> +.P
> +The above procedure is functionally equivalent to
> +the following mount operation using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> +      "ro,noatime,acl,user_xattr,iversion");
> +.EE
> +.in
> +.P
> +And here's an example of creating a mount object
> +of an NFS server share
> +and setting a Smack security module label.
> +However, instead of attaching it to a mount point,
> +the program uses the mount object directly
> +to open a file from the NFS share.
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd, fd;
> +\&
> +fsfd = fsopen("nfs", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> +.EE
> +.in
> +.P
> +Unlike the previous example,
> +this operation has no trivial equivalent with
> +.BR mount (2),
> +as it was not previously possible to create a mount object
> +that is not attached to any mount point.
> +.SH SEE ALSO
> +.BR fsconfig (2),
> +.BR fsmount (2),
> +.BR fspick (2),
> +.BR mount (2),
> +.BR mount_setattr (2),
> +.BR move_mount (2),
> +.BR open_tree (2),
> +.BR mount_namespaces (7)
> 
> -- 
> 2.51.0
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)