[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <feb98a0f-8d17-495c-b556-b4fe19446d5d@zytor.com>
Date: Thu, 15 May 2025 13:04:52 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Arnd Bergmann <arnd@...db.de>, LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
libc-alpha@...rceware.org, linux-arch@...r.kernel.org
Subject: Metalanguage for the Linux UAPI
OK, so this is something I have been thinking about for quite a while.
It would be a quite large project, so I would like to hear people's
opinions on it before even starting.
We have finally succeeded in divorcing the Linux UAPI from the general
kernel headers, but even so, there are a lot of things in the UAPI that
means it is not possible for an arbitrary libc to use it directly; for
example "struct termios" is not the glibc "struct termios", but
redefining it breaks the ioctl numbering unless the ioctl headers are
changed as well, and so on. However, other libcs want to use the struct
termios as defined in the kernel, or, more likely, struct termios2.
Furthermore, I was looking further into how C++ templates could be used
to make user pointers inherently safe and probably more efficient, but
ran into the problem that you really want to be able to convert a
user-tagged structure to a structure with "safe-user-tagged" members
(after access_ok), which turned out not to be trivially supportable even
after the latest C++ modernizations (without which I don't consider C++
viable at all; I would not consider versions of C++ before C++17 worthy
of even looking at; C++20 preferred.)
And it is not just generation of in-kernel versus out-of-kernel headers
that is an issue (which we have managed to deal with pretty well.) There
generally isn't enough information in C headers alone to do well at
creating bindings for other languages, *especially* given how many
constants are defined in terms of macros.
The use of C also makes it hard to mangle the headers for user space.
For example, glibc has to add __extension__ before anonymous struct or
union members in order to be able to compile in strict C90 mode.
I have been considering if it would make sense to create more of a
metalanguage for the Linux UAPI. This would be run through a more
advanced preprocessor than cpp written in C and yacc/bison. (It could
also be done via a gcc plugin or a DWARF parser, but I do not like tying
this to compiler internals, and DWARF parsing is probably more complex
and less versatile.)
It could thus provide things like "true" constants (constexpr for C++11
or C23, or enums), bitfield macro explosions and so on, depending on
what the backend user would like: namespacing, distributed enumerations,
and assembly offset constants, and even possibly syscall stubs.
There is of course no reason such a generator couldn't be used for
kernel-only headers at some point, but I am concentrating on the
Another major motivation is to be able to include one named struct
anonymously inside another, without having to repeat the definition.
(This is not supported in standard C or GNU C; MS C supports it as an
extension, and I have requested that it be added into GNU C which would
also allow it to be used with __extension__, and perhaps get folded into
a future C standard since it would now fit the criterion of more than
one implementation; however, the runway for being able to use that in
UAPI headers is quite long.)
I obviously want to keep a C-like syntax for this, which is a major
reason for using a parser like yacc/bison.
I have done such a project in the past, with some good success. That
being said, the requirements for the Linux UAPI language are obviously
much more complex. A few things I have considered are wanting to be able
to namespace constants or, more or less equivalently, create
enumerations in bits and pieces (consider ioctl constants, for example)
and have them coalesce into a single definition if appropriate for the
target language.
Speaking of ioctl constants: one of the current problems is that a fair
number of ioctl constants do not have the size/type annotations, and
perhaps worse, it is impossible to tell from just the numeric value
(since _IOC_NONE expands to 0, an _IO() ioctl ends up having no type
information at all.) This is something that *definitely* ought to be
added, even if a certain backend cannot preserve that information
Thoughts?
-hpa
Powered by blists - more mailing lists