linux-kernel - Re: [PATCH 24/24] Documentation: document ioctl interfaces better

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdW=eG9WWbeL0Od6dVzE7ydpBvVybvkn+cGFtyBR77sP6A@mail.gmail.com>
Date:   Thu, 12 Dec 2019 09:16:41 +0100
From:   Geert Uytterhoeven <geert@...ux-m68k.org>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Jens Axboe <axboe@...nel.dk>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Jonathan Corbet <corbet@....net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        y2038 Mailman List <y2038@...ts.linaro.org>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        Jonathan Neuschäfer <j.neuschaefer@....net>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>,
        Vladimir Oltean <olteanv@...il.com>,
        Kent Overstreet <kent.overstreet@...il.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH 24/24] Documentation: document ioctl interfaces better

Hi Arnd,

On Wed, Dec 11, 2019 at 9:53 PM Arnd Bergmann <arnd@...db.de> wrote:
> Documentation/process/botching-up-ioctls.rst was orignally
> written as a blog post for DRM driver writers, so it it misses
> some points while going into a lot of detail on others.
>
> Try to provide a replacement that addresses typical issues
> across a wider range of subsystems, and follows the style of
> the core-api documentation better.
>
> Signed-off-by: Arnd Bergmann <arnd@...db.de>

Thanks for your patch!

> --- /dev/null
> +++ b/Documentation/core-api/ioctl.rst
> @@ -0,0 +1,250 @@
> +======================
> +ioctl based interfaces
> +======================
> +
> +:c:func:`ioctl` is the most common way for applications to interface
> +with device drivers. It is flexible and easily extended by adding new
> +commands and can be passed through character devices, block devices as
> +well as sockets and other special file descriptors.
> +
> +However, it is also very easy to get ioctl command definitions wrong,
> +and hard to fix them later without breaking existing applications,
> +so this documentation tries to help developers get it right.
> +
> +Command number definitions
> +==========================
> +
> +The command number, or request number, is the second argument passed to
> +the ioctl system call. While this can be any 32-bit number that uniquely
> +identifies an action for a particular driver, there are a number of
> +conventions around defining them.

Interesting. I never realized the action is 32-bit in the kernel, but
unsigned long in userspace...

> +
> +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
> +ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``,
> +``_IOW``, and ``_IORW``. These should be used for all new commands,
> +with the correct parameters:
> +
> +_IO/_IOR/_IOW/_IOWR

This says _IO....

> +   The macro name determines whether the argument is used for passing
> +   data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is

into the kernel
, or is

> +   not a pointer (_IOC). It is possible but not recommended to pass an
> +   integer value instead of a pointer with _IOC.

...which is not explained here, but _IOC is?

> +
> +type
> +   An 8-bit number, often a character literal, specific to a subsystem
> +   or driver, and listed in :doc:`../ioctl/ioctl-number`
> +
> +nr
> +  An 8-bit number identifying the specific command, unique for a give
> +  value of 'type'
> +
> +size
> +  The name of the data type pointed to by the argument, the command
> +  number encodes the ``sizeof(size)`` value in a 13-bit or 14-bit integer,
> +  leading to a limit of 8191 bytes for the maximum size of the argument.
> +  Note: do not pass sizeof(type) type into _IOR/IOW, as that will lead
> +  to encoding sizeof(sizeof(type)), i.e. sizeof(size_t).

Looks like "size" could be renamed, to make this more obvious?

> +Timestamps
> +==========
> +
> +Traditionally, timestamps and timeout values are passed as ``struct
> +timespec`` or ``struct timeval``, but these are problematic because of
> +incompatible definitions of these structures in user space after the
> +move to 64-bit time_t.
> +
> +The __kernel_timespec type can be used instead to be embedded in other
> +data structures when separate second/nanosecond values are desired,
> +or passed to user space directly. This is still not ideal though,
> +as the structure matches neither the kernel's timespec64 nor the user
> +space timespec exactly. The get_timespec64() and put_timespec64() helper
> +functions canbe used to ensure that the layout remains compatible with

can be

> +user space and the padding is treated correctly.
> +
> +As it is cheap to convert seconds to nanoseconds, but the opposite
> +requires an expensive 64-bit division, a simple __u64 nanosecond value
> +can be simpler and more efficient.
> +
> +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
> +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``.  Unlike
> +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
> +or forwards due to leap second adjustments and clock_settime() calls.
> +
> +``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that
> +may be required for timestamps that need to be persistent across a reboot

Drop "may be required for timestamps that"?

> +or between multiple machines.

> +Structure layout
> +----------------
> +
> +Compatible data structures have the same layout on all architectures,
> +avoiding all problematic members:
> +
> +* ``long`` and ``unsigned long`` are the size of a register, so
> +  they can be either 32 bit or 64 bit wide and cannot be used in portable

32-bit or 64-bit (for consistency with the rest of the document)

> +  data structures. Fixed-length replacements are ``__s32``, ``__u32``,
> +  ``__s64`` and ``__u64``.
> +
> +* Pointers have the same problem, in addition to requiring the
> +  use of ``compat_ptr()``. The best workaround is to use ``__u64``
> +  in place of pointers, which requires a cast to ``uintptr_t`` in user
> +  space, and the use of ``u64_to_user_ptr()`` in the kernel to convert
> +  it back into a user pointer.
> +
> +* On the x86-32 (i386) architecture, the alignment of 64-bit variables
> +  is only 32 bit, but they are naturally aligned on most other

32-bit

> +  architectures including x86-64. This means a structure like
> +
> +  ::
> +
> +    struct foo {
> +        __u32 a;
> +        __u64 b;
> +        __u32 c;
> +    };
> +
> +  has four bytes of padding between a and b on x86-64, plus another four
> +  bytes of padding at the end, but no padding on i386, and it needs a
> +  compat_ioctl conversion handler to translate between the two formats.
> +
> +  To avoid this problem, all structures should have their members
> +  naturally aligned, or explicit reserved fields added in place of the
> +  implicit padding.
> +
> +* On ARM OABI user space, 16-bit member variables have 32-bit
> +  alignment, making them incompatible with modern EABI kernels.
> +  Conversely, on the m68k architecture, all struct members have at most
> +  16-bit alignment. These rarely cause problems as neither ARM-OABI nor

"have at most 16-bit alignment" sounds a bit weird to me, as a member
may have a greater alignment.
"struct members are not guaranteed to have an alignment greater than 16-bit"?

> +  m68k are supported by any compat mode, but for consistency, it is best
> +  to completely avoid 16-bit member variables.
> +
> +
> +* Bitfields and enums generally work as one would expect them to,
> +  but some properties of them are implementation-defined, so it is better
> +  to avoid them completely in ioctl interfaces.
> +
> +* ``char`` members can be either signed or unsigned, depending on
> +  the architecture, so the __u8 and __s8 types should be used for 8-bit
> +  integer values, though char arrays are clearer for fixed-length strings.
> +
> +Information leaks
> +=================
> +
> +Uninitialized data must not be copied back to user space, as this can
> +cause an information leak, which can be used to defeat kernel address
> +space layout randomization (KASLR), helping in an attack.
> +
> +As explained for the compat mode, it is best to not avoid any padding in

best to avoid any implicit padding?

> +data structures, but if there is already padding in existing structures,
> +the kernel driver must be careful to zero out the padding using
> +``memset()`` or similar before copying it to user space.
> +
> +Subsystem abstractions
> +======================
> +
> +While some device drivers implement their own ioctl function, most
> +subsystems implement the same command for multiple drivers.  Ideally the
> +subsystem has an .ioctl() handler that copies the arguments from and
> +to user space, passing them into subsystem specific callback functions
> +through normal kernel pointers.
> +
> +This helps in various ways:
> +
> +* Applications written for one driver are more likely to work for
> +  another one in the same subsystem if there are no subtle differences
> +  in the user space ABI.
> +
> +* The complexity of user space access and data structure layout at done

is done

> +  in one place, reducing the potential for implementation bugs.
> +
> +* It is more likely to be reviewed by experienced developers
> +  that can spot problems in the interface when the ioctl is shared
> +  between multiple drivers than when it is only used in a single driver.
> +
> +Alternatives to ioctl
> +=====================
> +
> +There are many cases in which ioctl is not the best solution for a
> +problem. Alternatives include

:

> +
> +* System calls are a better choice for a system-wide feature that
> +  is not tied to a physical device or constrained by the file system
> +  permissions of a character device node
> +
> +* netlink is the preferred way of configuring any network related
> +  objects through sockets.
> +
> +* debugfs is used for ad-hoc interfaces for debugging functionality
> +  that does not need to be exposed as a stable interface to applications.
> +
> +* sysfs is a good way to expose the state of an in-kernel object
> +  that is not tied to a file descriptor.
> +
> +* configfs can be used for more complex configuration than sysfs
> +
> +* A custom file system can provide extra flexibility with a simple
> +  user interface but add a lot of complexity in the implementation.

adds ... to

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds