netdev - Re: RFC: Ioctl v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YogoI6Augr1V6AHn@mit.edu>
Date:   Fri, 20 May 2022 19:45:39 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Kent Overstreet <kent.overstreet@...il.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, netdev@...r.kernel.org,
        mcgrof@...nel.org
Subject: Re: RFC: Ioctl v2

On Fri, May 20, 2022 at 12:16:52PM -0400, Kent Overstreet wrote:
> 
> Where the lack of real namespacing bites us more is when ioctls get promoted
> from filesystem or driver on up. Ted had a good example of an ext2 ioctl getting
> promoted to the VFS when it really shouldn't have, because it was exposing ext2
> specific data structures.
> 
> But because this is as simple as changing a #define EXT2_IOC to #define FS_IOC,
> it's really easy to do without adequate review - you don't have to change the
> ioctl number and break userspace, so why would you?
> 
> Introducing real namespacing would mean that promoting an ioctl to the VFS level
> would really have to be a new ioctl, and it'll get people to think more about
> what the new ioctl would be.

It's not clear that making harder not to break userspace is a
*feature*.  If existing programs are using a particular ioctl
namespace, being able to have other file systems adopt it has
historically been considered a *feature* not a *bug*.

At the time, we had working utilities, chattr and lsattr, which were
deployed on all Linux distributions, and newer file systems, such as
xfs, reiserfs, btrfs, etc., decided they wanted to piggy-back on those
existing utilities.  Forcing folks to deploy new utilities just
because it's the best way to force "adequate review" might be swinging
the pendulum too far in the straight-jacket direction.

It's worked the other way, too.  For example, the "shutdown" ioctl was
originally comes from XFS, and ext4 adopted that feature because the
existing interface was perfectly good, and that allows us to to get
the testing from xfstests for free.  The same goes for the extended
attribute, "trim", "freeze" and "thaw" ioctls.  

We've also promoted ioctls from btrfs to the VFS layer, including
FICLONE, GETLABEL, SETLABEL.  Take a look at include/uapi/linux/fs.h.
Ioctl's in 'X' namespace come from XFS.  Ioctl's in the 0x94 interface
come from btrfs.  And of course ioctl's in the 'f' interface come from
ext2/ext3/ext4.  It's actually worked pretty well, and we should
acknowledge that.

In the case of extended attributes, we had perfectly working userspace
tools that would have ***broken*** if we adhered to a doctrinaire,
when you promote an interface, we break the userspace interface Just
Because it's the Good Computer Science Thing To Do.

> ioctls are just private syscalls. Syscalls look like normal function calls, why
> can't ioctls?  Some ioctls do complicated things that require defining structs
> with all the tricky layout rules that we kernel devs have all had beaten into
> our brains - but most probably would not, if we could do normal-looking function
> calls.
> 
> Well, syscalls do require arch specific code to handle calling conventions, and
> we don't want that.....
>
> Userspace won't call this directly. Userspace will call normal looking
> functions, like:
> 
> int bcachefs_ioctl_disk_add(int fd, unsigned flags, char __user *disk_path);
> 
> Which will be a wrapper that casts the function arguments to u64s (or s64s for
> signed integers, so that we don't have surprises when kernel space and user
> space disagree about sizeof(long)) and then does the actual syscall.

So this approach requires that someone has to actually implement the
wrapper library.  Who will that be?  The answer could be, "let libc do
it", but then we need to worry about all the C libraries out there
actually adopting the new ioctl, which takes a long time, and
historically, some C library maintainers have had.... opinionated
points of view about the sort of "value add that should be done at the
C Library level".

There are other examples, such as the AIO and extended attribute
libraries, but they require someone willing to maintain those
libraries for the long term.  In some cases, such as the extended
attribute, that makes total sense --- but that's because that's a
library which is using an ioctl which was promoted from a specific
file system to a generic VFS interface, and so it had a lot of users.

In other cases, the only user of a particular interface might be a
file system specific userspace utility --- the kind of thing which is
shipped in btrfs-progs, e2fsprogs, f2fs-tools, xfsprogs, etc.
Creating a wrapper library for those kinds of ioctls is going to be
overkill.

So I suspect there won't be a lot of standardization here.  It could
be that there will be wrapper function that lives in util-linux, or
e2fsprogs, or xfsprogs, or whatever.  But in that case, we could do
exactly the same thing vis-a-vis creating wrapper function using the
existing ioctl interface.

For example, I have an ext2fs library function
ext2fs_get_device_size2(), which wraps not only the BLKGETSIZE and
BLKGETSIZE64 ioctls, but also the equivalents for Apple Darwin
(DKIOCGETBLOCKCOUNT), FreeBSD/NetBSD (DIOCGDINFO and later
DIOCGMEDIASIZE), and the Window's DeviceIoControl()'s
IOCTL_DISK_GET_DRIVE_GEOMETRY call.  The point is that wrapper
functions are very much orthogonal to the ioctl interface; we're all
going to have wrapper functions, and we'll create them where they are
needed.

If we force developers to have to create and maintain wrapper
libraries for all new ioctl2 interfaces, Just Because It's Proper
Computer Science Design Best Practices, it's going to be painful
enough that I suspect people will just stick to adding new ioctl code
points to the existing ioctl() interface.

So I think we need to be a little careful here.  We made adding System
Calls difficult, because it was Better(tm).  And there would good
reasons for that.  But that has also incentivized people to use the
escape hatches.  Make the "perfect" too painful, and people will find
ways to avoid using it when at all possible.

						- Ted

P.S.  During the LSF/MM hallway track, one maintainer said that the
fsdevel bikeshedding had gotten *so* painful with the process not
having a guaranteed consensus within a reasonal period of time, that
he was planning on adding a file system specific ioctl, and then later
on, if other people were wanted to use it, they would be free to use
the same ioctl code point and interface that he had come up with.