lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJfpegs-sDk0++FjSZ_RuW5m-z3BTBQdu4T9QPtWwmSZ1_4Yvw@mail.gmail.com>
Date:   Thu, 14 Sep 2023 12:13:54 +0200
From:   Miklos Szeredi <miklos@...redi.hu>
To:     Christian Brauner <brauner@...nel.org>
Cc:     Miklos Szeredi <mszeredi@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, linux-man@...r.kernel.org,
        linux-security-module@...r.kernel.org, Karel Zak <kzak@...hat.com>,
        Ian Kent <raven@...maw.net>,
        David Howells <dhowells@...hat.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <christian@...uner.io>,
        Amir Goldstein <amir73il@...il.com>
Subject: Re: [RFC PATCH 2/3] add statmnt(2) syscall

On Thu, 14 Sept 2023 at 11:28, Christian Brauner <brauner@...nel.org> wrote:
>
> On Wed, Sep 13, 2023 at 05:22:35PM +0200, Miklos Szeredi wrote:
> > Add a way to query attributes of a single mount instead of having to parse
> > the complete /proc/$PID/mountinfo, which might be huge.
> >
> > Lookup the mount by the old (32bit) or new (64bit) mount ID.  If a mount
> > needs to be queried based on path, then statx(2) can be used to first query
> > the mount ID belonging to the path.
> >
> > Design is based on a suggestion by Linus:
> >
> >   "So I'd suggest something that is very much like "statfsat()", which gets
> >    a buffer and a length, and returns an extended "struct statfs" *AND*
> >    just a string description at the end."
>
> So what we agreed to at LSFMM was that we split filesystem option
> retrieval into a separate system call and just have a very focused
> statx() for mounts with just binary and non-variable sized information.
> We even gave David a hard time about this. :) I would really love if we
> could stick to that.
>
> Linus, I realize this was your suggestion a long time ago but I would
> really like us to avoid structs with variable sized fields at the end of
> a struct. That's just so painful for userspace and universally disliked.
> If you care I can even find the LSFMM video where we have users of that
> api requesting that we please don't do this. So it'd be great if you
> wouldn't insist on it.

I completely missed that.

What I'm thinking is making it even simpler for userspace:

struct statmnt {
  ...
  char *mnt_root;
  char *mountpoint;
  char *fs_type;
  u32 num_opts;
  char *opts;
};

I'd still just keep options nul delimited.

Is there a good reason not to return pointers (pointing to within the
supplied buffer obviously) to userspace?

>
> This will also allow us to turn statmnt() into an extensible argument
> system call versioned by size just like we do any new system calls with
> struct arguments (e.g., mount_setattr(), clone3(), openat2() and so on).
> Which is how we should do things like that.

The mask mechanism also allow versioning of the struct.

>
> Other than that I really think this is on track for what we ultimately
> want.
>
> > +struct stmt_str {
> > +     __u32 off;
> > +     __u32 len;
> > +};
> > +
> > +struct statmnt {
> > +     __u64 mask;             /* What results were written [uncond] */
> > +     __u32 sb_dev_major;     /* Device ID */
> > +     __u32 sb_dev_minor;
> > +     __u64 sb_magic;         /* ..._SUPER_MAGIC */
> > +     __u32 sb_flags;         /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
> > +     __u32 __spare1;
> > +     __u64 mnt_id;           /* Unique ID of mount */
> > +     __u64 mnt_parent_id;    /* Unique ID of parent (for root == mnt_id) */
> > +     __u32 mnt_id_old;       /* Reused IDs used in proc/.../mountinfo */
> > +     __u32 mnt_parent_id_old;
> > +     __u64 mnt_attr;         /* MOUNT_ATTR_... */
> > +     __u64 mnt_propagation;  /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
> > +     __u64 mnt_peer_group;   /* ID of shared peer group */
> > +     __u64 mnt_master;       /* Mount receives propagation from this ID */
> > +     __u64 propagate_from;   /* Propagation from in current namespace */
> > +     __u64 __spare[20];
> > +     struct stmt_str mnt_root;       /* Root of mount relative to root of fs */
> > +     struct stmt_str mountpoint;     /* Mountpoint relative to root of process */
> > +     struct stmt_str fs_type;        /* Filesystem type[.subtype] */
>
> I think if we want to do this here we should add:
>
> __u64 fs_type
> __u64 fs_subtype
>
> fs_type can just be our filesystem magic number and we introduce magic

It's already there: sb_magic.

However it's not a 1:1 mapping (ext* only has one magic).

> numbers for sub types as well. So we don't need to use strings here.

Ugh.

Thanks,
Miklos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ