linux-kernel - Re: Explicitly defining the userspace API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YmEqfFdYN0Rml6V2@yuki>
Date:   Thu, 21 Apr 2022 11:57:16 +0200
From:   Cyril Hrubis <chrubis@...e.cz>
To:     Spencer Baugh <sbaugh@...ern.com>
Cc:     linux-api@...r.kernel.org, linux-kernel@...r.kernel.org,
        marcin@...zkiewicz.com.pl, torvalds@...ux-foundation.org,
        arnd@...db.de
Subject: Re: Explicitly defining the userspace API

Hi!
> Linux guarantees the stability of its userspace API, but the API
> itself is only informally described, primarily with English prose.  I
> want to add an explicit, authoritative machine-readable definition of
> the Linux userspace API.

My background is in kernel testing I do maintain the Linux Test Project
for more than a decade now. During the years we did create many "unit
tests" for kernel syscalls that are watching over the syscall API and
making sure that we get right results for both valid and invalid inputs.
These tests can also be considered to be a form of a documentation. The
same goes for some of the selftests that have been added to kernel repo
in the recent years. In a sense these are the most detailed descriptions
of the interfaces we have.

The main problem is that the kernel userspace boundary is large, we have
thousands of tests and I'm pretty sure that we don't cover even half of
it.

Also some of the interfaces are too complex to be even described in any
formal system, mostly the modern stuff such as io_uring or bfp. I have
had hard time even understading how to use these and I doubt I would be
even able to build a formal system to describe them. Especially since
the io_uring is mostly syscall less and we talk to the kernel by shared
buffers and atomic data updates.

> As background, in a conventional libc like glibc, read(2) calls the
> Linux system call read, passing arguments in an architecture-specific
> way according to the specific details of read.
> 
> The details of these syscalls are at best documented in manpages, and
> often defined only by the implementation.  Anyone else who wants to
> work with a syscall, in any way, needs to duplicate all those details.
> 
> So the most basic definition of the API would just represent the
> information already present in SYSCALL_DEFINE macros: the C types of
> arguments and return values.  More usefully, it would describe the
> formats of those arguments and return values: that the first argument
> to read is a file descriptor rather than an arbitrary integer, and
> what flags are valid in the flags argument of openat, and that open
> returns a file descriptor.  A step beyond that would be describing, in
> some limited way, the effects of syscalls; for example, that read
> writes into the passed buffer the number of bytes that it returned.

Having this would be awesome, this is just one step from actually
generating automated tests for the syscalls. However my estimate is that
even if you started to work on this now it will take decade to get
somewhere, but maybe I'm too pesimistic.

Stil fingers crossed.

-- 
Cyril Hrubis
chrubis@...e.cz