linux-kernel - Re: C aggregate passing (Rust kernel policy)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFJgqgTmxeH+y8yir1WL2YvLXhemGE9WU0sDBnyx8yVz8OAxyw@mail.gmail.com>
Date: Sat, 1 Mar 2025 07:19:50 -0700
From: Ventura Jack <venturajack85@...il.com>
To: Geoffrey Thomas <geofft@...reload.com>
Cc: Ralf Jung <post@...fj.de>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	Miguel Ojeda <miguel.ojeda.sandonis@...il.com>, Gary Guo <gary@...yguo.net>, 
	torvalds@...ux-foundation.org, airlied@...il.com, boqun.feng@...il.com, 
	david.laight.linux@...il.com, ej@...i.de, gregkh@...uxfoundation.org, 
	hch@...radead.org, hpa@...or.com, ksummit@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)

On Fri, Feb 28, 2025 at 3:14 PM Geoffrey Thomas <geofft@...reload.com> wrote:
>
> On Fri, Feb 28, 2025, at 3:41 PM, Ventura Jack wrote:
> >
> > I did give the example of the time crate. Do you not consider
> > that a very significant example of breakage? Surely, with
> > as public and large an example of breakage as the time crate,
> > there clearly is something.
> >
> > I will acknowledge that Rust editions specifically do not
> > count as breaking code, though the editions feature,
> > while interesting, does have some drawbacks.
> >
> > The time crate breakage is large from what I can tell. When I
> > skim through GitHub issues in different projects,
> > it apparently cost some people significant time and pain.
> >
> >     https://github.com/NixOS/nixpkgs/issues/332957#issue-2453023525
> >         "Sorry for the inconvenience. I've lost a lot of the last
> >         week to coordinating the update, collecting broken
> >         packages, etc., but hopefully by spreading out the
> >         work from here it won't take too much of anybody
> >         else's time."
> >
> >     https://github.com/NixOS/nixpkgs/issues/332957#issuecomment-2274824965
> >         "On principle, rust 1.80 is a new language due
> >         to the incompatible change (however inadvertent),
> >         and should be treated as such. So I think we need
> >         to leave 1.79 in nixpkgs, a little while longer. We can,
> >         however, disable its hydra builds, such that
> >         downstream will learn about the issue through
> >         increased build times and have a chance to step up,
> >         before their toys break."
>
> There's two things about this specific change that I think are relevant
> to a discussion about Rust in the Linux kernel that I don't think got
> mentioned (apologies if they did and I missed it in this long thread).
>
> First, the actual change was not in the Rust language; it was in the
> standard library, in the alloc crate, which implemented an additional
> conversion for standard library types (which is why existing code became
> ambiguous). Before v6.10, the kernel had an in-tree copy/fork of the
> alloc crate, and would have been entirely immune from this change. If
> someone synced the in-tree copy of alloc and noticed the problem, they
> could have commented out the new conversions, and the actual newer rustc
> binary would have continued to compile the old kernel code.
>
> To be clear, I do think it's good that the kernel no longer has a copy
> of the Rust standard library code, and I'm not advocating going back to
> the copy. But if we're comparing the willingness of languages to break
> backwards compatibility in a new version, this is much more analogous to
> C or C++ shipping a new function in the standard library whose name
> conflicts with something the kernel is already using, not to a change in
> the language semantics. My understanding is that this happened several
> times when C and C++ were younger (and as a result there are now rules
> about things like leading underscores, which language users seem not to
> be universally aware of, and other changes are now relegated to standard
> version changes).

>[Omitted] But if we're comparing the willingness of languages to break
> backwards compatibility in a new version, this is much more analogous to
> C or C++ shipping a new function in the standard library whose name
> conflicts with something the kernel is already using, not to a change in
> the language semantics. [Omitted]

I am not sure that this would make sense for C++, since C++
has namespaces, and thus shipping a new function should
not be an issue, I believe. For C++, I suspect it would be more
analogous to for instance adding an extra implicit conversion
of some kind, since that would fit more with changed type
inference. Has C++ done such a thing?

However, for both C and C++, the languages and standard
libraries release much less often, at least officially. And the
languages and standard libraries do not normally change
with a compiler update, or are not normally meant to. For
Rust, I suppose the lines are currently more blurred
between the sole major Rust compiler rustc, the Rust
language, and the Rust standard library, when rustc has a new
release. Some users complained that this kind of change
that affected the Rust time crate and others, should have
been put in a new Rust edition. The 1.80 was a relatively
minor rustc compiler release, not a Rust language edition
release.

Different for Rust in that it was a minor compiler release that
broke a lot, not even a new Rust edition. And also different in that
it broke what did and did not compile from what I can tell.
And Rust has long ago reached 1.0.

I wonder if this situation would still have been able to happen
if gccrs was production ready Would projects just have been able
to swith to gccrs instead? Or more easily stay on an older
release/version of rustc? I am not sure how it would all pan out.

I do dislike it a lot if C has added functions that could cause
name collisions, especially after C matured. Though I
assume that these name collisions these days at
most happen in new releases/standard versions of
the C language and library, not in compiler versions. C could
have avoided all that with features like C++ namespaces or
Rust modules/crates, but C is intentionally kept simple.
C's simplicity has various trade-offs.

> Which brings me to the second point: the reason this was painful for,
> e.g., NixOS is that they own approximately none of the code that was
> affected. They're a redistributor of code that other people have written
> and packaged, with Cargo.toml and Cargo.lock files specifying specific
> versions of crates that recursively eventually list some specific
> version of the time crate. If there's something that needs to be fixed
> in the time crate, every single Cargo.toml file that has a version bound
> that excludes the fixed version of the time crate needs to be fixed.
> Ideally, NixOS wouldn't carry this patch locally, which means they're
> waiting on an upstream release of the crates that depend on the time
> crate. This, then, recursively brings the problem to the crates that
> depend on the crates that depend on the time crate, until you have
> recursively either upgraded your versions of everything in the ecosystem
> or applied distribution-specific patches. That recursive dependency walk
> with volunteer FOSS maintainers in the loop at each step is painful.
>
> There is nothing analogous in the kernel. Because of the no-stable-API
> rule, nobody will find themselves needing to make a release of one
> subsystem, then upgrading another subsystem to depend on that release,
> then upgrading yet another subsystem in turn. They won't even need
> downstream subsystem maintainers to approve any patch. They'll just make
> the change in the file that needs the change and commit it. So, while a
> repeat of this situation would still be visible to the kernel as a break
> in backwards compatibility, the actual response to the situation would
> be thousands of times less painful: apply the one-line fix to the spot
> in the kernel that needs it, and then say, "If you're using Rust 1.xxx
> or newer, you need kernel 6.yyy or newer or you need to cherry-pick this
> patch." (You'd probably just cc -stable on the commit.) And then you're
> done; there's nothing else you need to do.

My pondering in

>> Maybe NixOS was hit harder than others.

must have been accurate then. Though some others were
hit as well, presumably typically significantly less hard than NixOS.

> There are analogously painful experiences with C/C++ compiler upgrades
> if you are in the position of redistributing other people's code, as
> anyone who has tried to upgrade GCC in a corporate environment with
> vendored third-party libraries knows. A well-documented public example
> of this is what happened when GCC dropped support for things like
> implicit int: old ./configure scripts would silently fail feature
> detection for features that did exist, and distributions like Fedora
> would need to double-check the ./configure results and decide whether to
> upgrade the library (potentially triggering downstream upgrades) or
> carry a local patch. See the _multi-year_ effort around
> https://fedoraproject.org/wiki/Changes/PortingToModernC
> https://news.ycombinator.com/item?id=39429627

Is this for a compiler version upgrade, or for a new language and
standard library release? The former happens much more often for C
than the latter.

Implicit int was not a nice feature, but its removal was also
not nice for backwards compatibility, I definitely agree about that.
But are you sure that it was entirely silent? When I run it in Godbolt
with different versions of GCC, a warning is given for many
older versions of GCC if implicit int is used. And in newer
versions, in at least some cases, a compile time error is given.
Implicit int was removed in C99, and GCC allowed it with a warning
for many years after 1999, as far as I can see.

If for many years, or multiple decades (maybe 1999 to 2022), a
warning was given, that does mitigate it a bit. But I agree
it is not nice. I suppose this is where Rust editions could help
a lot. But Rust editions are used much more frequently, much
more extensively and for much deeper changes (including
semantic changes) than this as far as I can figure out. A
Rust editions style feature, but with way more careful
and limited usage, might have been nice for the C language,
and other languages. Then again, Rust's experiment with
Rust editions, and also how Rust uses its editions feature, is
interesting, experimental and novel as far as I can figure out.

> Within the Linux kernel, this class of pain doesn't arise: we aren't
> using other people's packaging or other people's ./configure scripts.
> We're using our own code (or we've decided we're okay acting as if we
> authored any third-party code we vendor), and we have one build system
> and one version of what's in the kernel tree.
>
> So - without denying that this was a compatibility break in a way that
> didn't live up to a natural reading of Rust's compatibility promise, and
> without denying that for many communities other than the kernel it was a
> huge pain, I think the implications for Rust in the kernel are limited.

In this specific case. But does the backwards compatibility
guarantees for the Rust language that allows type inference
changes, only apply to the Rust standard library, or also
to the language?

And there are multiple parts of the Rust
standard library, "core", "alloc", "std". Can the changes
happen to the parts of the Rust standard library that
everyone necessarily uses as I understand it? On the
other hand, I would assume that will not happen, "core"
is small and fundamental as I understand it.

And it did happen with a rustc release, not a new Rust
edition.

> > Another concern I have is with Rust editions. It is
> > a well defined way of having language "versions",
> > and it does have automated conversion tools,
> > and Rust libraries choose themselves which
> > edition of Rust that they are using, independent
> > of the version of the compiler.
> >
> > However, there are still some significant changes
> > to the language between editions, and that means
> > that to determine the correctness of Rust code, you
> > must know which edition it is written for.
> >
> > For instance, does this code have a deadlock?
> >
> >     fn f(value: &RwLock<Option<bool>>) {
> >         if let Some(x) = *value.read().unwrap() {
> >             println!("value is {x}");
> >         } else {
> >             let mut v = value.write().unwrap();
> >             if v.is_none() {
> >                 *v = Some(true);
> >             }
> >         }
> >     }
> >
> > The answer is that it depends on whether it is
> > interpreted as being in Rust edition 2021 or
> > Rust edition 2024. This is not as such an
> > issue for upgrading, since there are automated
> > conversion tools. But having semantic
> > changes like this means that programmers must
> > be aware of the edition that code is written in, and
> > when applicable, know the different semantics of
> > multiple editions. Rust editions are published every 3
> > years, containing new semantic changes typically.
>
> This doesn't seem particularly different from C (or C++) language
> standard versions. The following code compiles successfully yet behaves
> differently under --std=c23 and --std=c17 or older:
>
> int x(void) {
>     auto n = 1.5;
>     return n * 2;
> }
>
> (inspired by https://stackoverflow.com/a/77383671/23392774)
>

I disagree with you 100% here regarding your example.

First off, your example does not compile like you claim it does
when I try it.

    #include "stdio.h"

    int x(void) {

        auto n = 1.5;
        return n * 2;
    }

    int main() {

        printf("%d", x());

        return 0;
    }

When I run it with GCC 14.2 --std=c17, or Clang 19.1.0 --std=c17,
I get compile-time errors, complaining about implicit int.
Why did you claim that it would compile successfully?
When I run it with GCC 5.1 or Clang 3.5, I get compile-time
warnings instead about implicit int. Only with --std=c23
does it compile and run.

Like, that example must have either given warnings or compile-time
errors for decades.

Second off, this appears to be a combination of two changes,
implicit int and storage-class specifier/type inference dual
meaning of `auto`.

- "Implicit int", removed in C99, compile-time warning in GCC
    from perhaps 1999 to 2022, gives a compile-time error
    from perhaps 2022.
- `auto` keyword in C, used originally as a storage-class
    specifier, like in `auto double x`. Since `auto` is typically the
    default storage-class for the cases where it can apply,
    as I understand it, it was probably almost never used in
    practice. In C23, they decided to reuse it for type inference
    as well. C23 keeps it as a storage-class specifier. The reason
    for reusing it here is probably due to the desire to avoid
    collisions and to keep as much backwards compatibility
    as possible, and because there were few keywords to use.
    And to be more consistent with C++.
- C++ might never have allowed implicit int, I am not sure.
    C++ did use the `auto` keyword as a storage-class specifier,
    but removed it for that purpose in C++11, and made its
    meaning to be type inference instead. But before C++11,
    `auto n = 1.5` was not allowed, since implicit int was
    not allowed in C++, possibly never allowed.

Even though there are probably very few programs out there
that use or used `auto` as a storage-class specifier for either
C or C++, I do dislike this change in some ways, since it could
as you say change language semantics. The combination in
your example is rare, however, and there might have been
decades of compile-time warnings or errors between. I do
not know whether it occurred in practice, since using `auto`
as a storage-class specifier must have been very rare, and
when used, the proper usage would have been more akin to
`auto int x` or `auto float x`.

And with decades of compile-time warnings, and removal from
the language for decades, this example you give here honestly
seems like an example against your points, not for your points.

I do dislike this kind of keyword reusage, even when done
very carefully, since it could lead to trouble. For C and C++,
they are heavily constrained in what they can do here,
while Rust has the option of Rust editions. But Rust editions
are used for much less careful and much deeper changes
like above, where the same code in one edition causes a
deadlock, in another does not cause a deadlock and runs.

    fn f(value: &RwLock<Option<bool>>) {
        if let Some(x) = *value.read().unwrap() {
            println!("value is {x}");
        } else {
            let mut v = value.write().unwrap();
            if v.is_none() {
                *v = Some(true);
            }
        }
    }

For the specific example.

    https://doc.rust-lang.org/edition-guide/rust-2024/temporary-if-let-scope.html

How to handle the issue of keywords, from the perspective of
programming language design? In C and C++,
the approach appears to be, to be very careful. In Rust,
there is Rust editions, which I honestly believe can be a
good approach if used in a minimal way, maybe like rare, tiny
changes that do not change semantics, like every 20 years. Rust
on the other hand uses Rust editions to make more frequent
(every 3 years) and much deeper changes, and to semantics.
The usage that Rust has with its editions feature reminds me
more of an experimental research language, or like Scala.
On the other hand, maybe I am wrong, and it is fine for Rust
to use its editions like this. But I am very wary of it, and it seems
experimental to me. Then there are other programming
language design approaches as well, like giving keywords their
own syntactic namespace, but that can only be done when
designing a new language.

Best, VJ.