[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJfpegs-sDk0++FjSZ_RuW5m-z3BTBQdu4T9QPtWwmSZ1_4Yvw@mail.gmail.com>
Date: Thu, 14 Sep 2023 12:13:54 +0200
From: Miklos Szeredi <miklos@...redi.hu>
To: Christian Brauner <brauner@...nel.org>
Cc: Miklos Szeredi <mszeredi@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-api@...r.kernel.org, linux-man@...r.kernel.org,
linux-security-module@...r.kernel.org, Karel Zak <kzak@...hat.com>,
Ian Kent <raven@...maw.net>,
David Howells <dhowells@...hat.com>,
Al Viro <viro@...iv.linux.org.uk>,
Christian Brauner <christian@...uner.io>,
Amir Goldstein <amir73il@...il.com>
Subject: Re: [RFC PATCH 2/3] add statmnt(2) syscall
On Thu, 14 Sept 2023 at 11:28, Christian Brauner <brauner@...nel.org> wrote:
>
> On Wed, Sep 13, 2023 at 05:22:35PM +0200, Miklos Szeredi wrote:
> > Add a way to query attributes of a single mount instead of having to parse
> > the complete /proc/$PID/mountinfo, which might be huge.
> >
> > Lookup the mount by the old (32bit) or new (64bit) mount ID. If a mount
> > needs to be queried based on path, then statx(2) can be used to first query
> > the mount ID belonging to the path.
> >
> > Design is based on a suggestion by Linus:
> >
> > "So I'd suggest something that is very much like "statfsat()", which gets
> > a buffer and a length, and returns an extended "struct statfs" *AND*
> > just a string description at the end."
>
> So what we agreed to at LSFMM was that we split filesystem option
> retrieval into a separate system call and just have a very focused
> statx() for mounts with just binary and non-variable sized information.
> We even gave David a hard time about this. :) I would really love if we
> could stick to that.
>
> Linus, I realize this was your suggestion a long time ago but I would
> really like us to avoid structs with variable sized fields at the end of
> a struct. That's just so painful for userspace and universally disliked.
> If you care I can even find the LSFMM video where we have users of that
> api requesting that we please don't do this. So it'd be great if you
> wouldn't insist on it.
I completely missed that.
What I'm thinking is making it even simpler for userspace:
struct statmnt {
...
char *mnt_root;
char *mountpoint;
char *fs_type;
u32 num_opts;
char *opts;
};
I'd still just keep options nul delimited.
Is there a good reason not to return pointers (pointing to within the
supplied buffer obviously) to userspace?
>
> This will also allow us to turn statmnt() into an extensible argument
> system call versioned by size just like we do any new system calls with
> struct arguments (e.g., mount_setattr(), clone3(), openat2() and so on).
> Which is how we should do things like that.
The mask mechanism also allow versioning of the struct.
>
> Other than that I really think this is on track for what we ultimately
> want.
>
> > +struct stmt_str {
> > + __u32 off;
> > + __u32 len;
> > +};
> > +
> > +struct statmnt {
> > + __u64 mask; /* What results were written [uncond] */
> > + __u32 sb_dev_major; /* Device ID */
> > + __u32 sb_dev_minor;
> > + __u64 sb_magic; /* ..._SUPER_MAGIC */
> > + __u32 sb_flags; /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
> > + __u32 __spare1;
> > + __u64 mnt_id; /* Unique ID of mount */
> > + __u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */
> > + __u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */
> > + __u32 mnt_parent_id_old;
> > + __u64 mnt_attr; /* MOUNT_ATTR_... */
> > + __u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
> > + __u64 mnt_peer_group; /* ID of shared peer group */
> > + __u64 mnt_master; /* Mount receives propagation from this ID */
> > + __u64 propagate_from; /* Propagation from in current namespace */
> > + __u64 __spare[20];
> > + struct stmt_str mnt_root; /* Root of mount relative to root of fs */
> > + struct stmt_str mountpoint; /* Mountpoint relative to root of process */
> > + struct stmt_str fs_type; /* Filesystem type[.subtype] */
>
> I think if we want to do this here we should add:
>
> __u64 fs_type
> __u64 fs_subtype
>
> fs_type can just be our filesystem magic number and we introduce magic
It's already there: sb_magic.
However it's not a 1:1 mapping (ext* only has one magic).
> numbers for sub types as well. So we don't need to use strings here.
Ugh.
Thanks,
Miklos
Powered by blists - more mailing lists