linux-kernel - Re: C aggregate passing (Rust kernel policy)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <09acd087-ba88-4b8e-950b-dfede2f8bec3@app.fastmail.com>
Date: Fri, 28 Feb 2025 17:13:57 -0500
From: "Geoffrey Thomas" <geofft@...reload.com>
To: "Ventura Jack" <venturajack85@...il.com>
Cc: "Ralf Jung" <post@...fj.de>,
 "Kent Overstreet" <kent.overstreet@...ux.dev>,
 "Miguel Ojeda" <miguel.ojeda.sandonis@...il.com>,
 "Gary Guo" <gary@...yguo.net>, torvalds@...ux-foundation.org,
 airlied@...il.com, boqun.feng@...il.com, david.laight.linux@...il.com,
 ej@...i.de, gregkh@...uxfoundation.org, hch@...radead.org, hpa@...or.com,
 ksummit@...ts.linux.dev, linux-kernel@...r.kernel.org,
 rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)

On Fri, Feb 28, 2025, at 3:41 PM, Ventura Jack wrote:
>
> I did give the example of the time crate. Do you not consider
> that a very significant example of breakage? Surely, with
> as public and large an example of breakage as the time crate,
> there clearly is something.
>
> I will acknowledge that Rust editions specifically do not
> count as breaking code, though the editions feature,
> while interesting, does have some drawbacks.
>
> The time crate breakage is large from what I can tell. When I
> skim through GitHub issues in different projects,
> it apparently cost some people significant time and pain.
>
>     https://github.com/NixOS/nixpkgs/issues/332957#issue-2453023525
>         "Sorry for the inconvenience. I've lost a lot of the last
>         week to coordinating the update, collecting broken
>         packages, etc., but hopefully by spreading out the
>         work from here it won't take too much of anybody
>         else's time."
>
>     https://github.com/NixOS/nixpkgs/issues/332957#issuecomment-2274824965
>         "On principle, rust 1.80 is a new language due
>         to the incompatible change (however inadvertent),
>         and should be treated as such. So I think we need
>         to leave 1.79 in nixpkgs, a little while longer. We can,
>         however, disable its hydra builds, such that
>         downstream will learn about the issue through
>         increased build times and have a chance to step up,
>         before their toys break."

There's two things about this specific change that I think are relevant
to a discussion about Rust in the Linux kernel that I don't think got
mentioned (apologies if they did and I missed it in this long thread).

First, the actual change was not in the Rust language; it was in the
standard library, in the alloc crate, which implemented an additional
conversion for standard library types (which is why existing code became
ambiguous). Before v6.10, the kernel had an in-tree copy/fork of the
alloc crate, and would have been entirely immune from this change. If
someone synced the in-tree copy of alloc and noticed the problem, they
could have commented out the new conversions, and the actual newer rustc
binary would have continued to compile the old kernel code.

To be clear, I do think it's good that the kernel no longer has a copy
of the Rust standard library code, and I'm not advocating going back to
the copy. But if we're comparing the willingness of languages to break
backwards compatibility in a new version, this is much more analogous to
C or C++ shipping a new function in the standard library whose name
conflicts with something the kernel is already using, not to a change in
the language semantics. My understanding is that this happened several
times when C and C++ were younger (and as a result there are now rules
about things like leading underscores, which language users seem not to
be universally aware of, and other changes are now relegated to standard
version changes).

Of course, we don't use the userspace C standard library in the kernel.
But a good part of the goal in using Rust is to work with a more
expressive language than C and in turn to reuse things that have already
been well expressed in its standard library, whereas there's much less
in the C standard library that would be prohibitive to reimplement
inside the kernel (and there's often interest in doing it differently
anyway, e.g., strscpy). I imagine that if we were to use, say, C++,
there will be similar considerations about adopting smart pointer
implementations from a good userspace libstdc++. If we were to use
Objective-C we probably wouldn't write our own -lobjc runtime from
scratch, and so forth. So, by using a more expressive language than C,
we're asking that language to supply code that otherwise would have been
covered by the kernel-internal no-stable-API rule, and we're making an
expectation of API stability for it, which is a stronger demand than we
currently make of C.

Which brings me to the second point: the reason this was painful for,
e.g., NixOS is that they own approximately none of the code that was
affected. They're a redistributor of code that other people have written
and packaged, with Cargo.toml and Cargo.lock files specifying specific
versions of crates that recursively eventually list some specific
version of the time crate. If there's something that needs to be fixed
in the time crate, every single Cargo.toml file that has a version bound
that excludes the fixed version of the time crate needs to be fixed.
Ideally, NixOS wouldn't carry this patch locally, which means they're
waiting on an upstream release of the crates that depend on the time
crate. This, then, recursively brings the problem to the crates that
depend on the crates that depend on the time crate, until you have
recursively either upgraded your versions of everything in the ecosystem
or applied distribution-specific patches. That recursive dependency walk
with volunteer FOSS maintainers in the loop at each step is painful.

There is nothing analogous in the kernel. Because of the no-stable-API
rule, nobody will find themselves needing to make a release of one
subsystem, then upgrading another subsystem to depend on that release,
then upgrading yet another subsystem in turn. They won't even need
downstream subsystem maintainers to approve any patch. They'll just make
the change in the file that needs the change and commit it. So, while a
repeat of this situation would still be visible to the kernel as a break
in backwards compatibility, the actual response to the situation would
be thousands of times less painful: apply the one-line fix to the spot
in the kernel that needs it, and then say, "If you're using Rust 1.xxx
or newer, you need kernel 6.yyy or newer or you need to cherry-pick this
patch." (You'd probably just cc -stable on the commit.) And then you're
done; there's nothing else you need to do.

There are analogously painful experiences with C/C++ compiler upgrades
if you are in the position of redistributing other people's code, as
anyone who has tried to upgrade GCC in a corporate environment with
vendored third-party libraries knows. A well-documented public example
of this is what happened when GCC dropped support for things like
implicit int: old ./configure scripts would silently fail feature
detection for features that did exist, and distributions like Fedora
would need to double-check the ./configure results and decide whether to
upgrade the library (potentially triggering downstream upgrades) or
carry a local patch. See the _multi-year_ effort around
https://fedoraproject.org/wiki/Changes/PortingToModernC
https://news.ycombinator.com/item?id=39429627

Within the Linux kernel, this class of pain doesn't arise: we aren't
using other people's packaging or other people's ./configure scripts.
We're using our own code (or we've decided we're okay acting as if we
authored any third-party code we vendor), and we have one build system
and one version of what's in the kernel tree.

So - without denying that this was a compatibility break in a way that
didn't live up to a natural reading of Rust's compatibility promise, and
without denying that for many communities other than the kernel it was a
huge pain, I think the implications for Rust in the kernel are limited.

> Another concern I have is with Rust editions. It is
> a well defined way of having language "versions",
> and it does have automated conversion tools,
> and Rust libraries choose themselves which
> edition of Rust that they are using, independent
> of the version of the compiler.
>
> However, there are still some significant changes
> to the language between editions, and that means
> that to determine the correctness of Rust code, you
> must know which edition it is written for.
>
> For instance, does this code have a deadlock?
>
>     fn f(value: &RwLock<Option<bool>>) {
>         if let Some(x) = *value.read().unwrap() {
>             println!("value is {x}");
>         } else {
>             let mut v = value.write().unwrap();
>             if v.is_none() {
>                 *v = Some(true);
>             }
>         }
>     }
>
> The answer is that it depends on whether it is
> interpreted as being in Rust edition 2021 or
> Rust edition 2024. This is not as such an
> issue for upgrading, since there are automated
> conversion tools. But having semantic
> changes like this means that programmers must
> be aware of the edition that code is written in, and
> when applicable, know the different semantics of
> multiple editions. Rust editions are published every 3
> years, containing new semantic changes typically.

This doesn't seem particularly different from C (or C++) language
standard versions. The following code compiles successfully yet behaves
differently under --std=c23 and --std=c17 or older:

int x(void) {
    auto n = 1.5;
    return n * 2;
}

(inspired by https://stackoverflow.com/a/77383671/23392774)

-- 
Geoffrey Thomas
geofft@...reload.com