linux-kernel - Re: C aggregate passing (Rust kernel policy)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFJgqgS-S3ZbPfYsA-eJmCXHhMrzwaKW1-G+LegKJNqqGm31UQ@mail.gmail.com>
Date: Thu, 27 Feb 2025 10:33:51 -0700
From: Ventura Jack <venturajack85@...il.com>
To: Ralf Jung <post@...fj.de>
Cc: Kent Overstreet <kent.overstreet@...ux.dev>, 
	Miguel Ojeda <miguel.ojeda.sandonis@...il.com>, Gary Guo <gary@...yguo.net>, 
	torvalds@...ux-foundation.org, airlied@...il.com, boqun.feng@...il.com, 
	david.laight.linux@...il.com, ej@...i.de, gregkh@...uxfoundation.org, 
	hch@...radead.org, hpa@...or.com, ksummit@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)

On Wed, Feb 26, 2025 at 3:28 PM Ralf Jung <post@...fj.de> wrote:
>
> Hi all,
>
> On 26.02.25 19:09, Ventura Jack wrote:
> > Is Miri the only one of its kind in the programming world?
> > There are not many system languages in mass use, and
> > those are the languages that first and foremost deal
> > with undefined behavior. That would make Miri extra impressive.
>
> I am not aware of a comparable tool that would be in wide-spread use, or that is
> carefully aligned with the semantics of an actual compiler.
> For C, there is Cerberus (https://www.cl.cam.ac.uk/~pes20/cerberus/) as an
> executable version of the C specification, but it can only run tiny examples.
> The verified CompCert compiler comes with a semantics one could interpret, but
> that only checks code for compatibility with CompCert C, which has a lot less
> (and a bit more) UB than real C.
> There are also two efforts that turned into commercial tools that I have not
> tried, and for which there is hardly any documentation of how they interpret the
> C standard so it's not clear what a green light from them means when compiling
> with gcc or clang. I also don't know how much real-world code they can actually run.
> - TrustInSoft/tis-interpreter, mostly gone from the web but still available in
> the wayback machine
> (https://web.archive.org/web/20200804061411/https://github.com/TrustInSoft/tis-interpreter/);
> I assume this got integrated into their "TrustInSoft Analyzer" product.
> - kcc, a K-framework based formalization of C that is executable. The public
> repo is dead (https://github.com/kframework/c-semantics) and when I tried to
> build their tool that didn't work. The people behind this have a company that
> offers "RV-Match" as a commercial product claiming to find bugs in C based on "a
> complete formal ISO C11 semantics" so I guess that is where their efforts go now.
>
> For C++ and Zig, I am not aware of anything comparable.
>
> Part of the problem is that in C, 2 people will have 3 ideas for what the
> standard means. Compiler writers and programmers regularly have wildly
> conflicting ideas of what is and is not allowed. There are many different places
> in the standard that have to be scanned to answer "is this well-defined" even
> for very simple programs. (https://godbolt.org/z/rjaWc6EzG is one of my favorite
> examples.) A tool can check a single well-defined semantics, but who gets to
> decide what exactly those semantics are?
> Formalizing the C standard requires extensive interpretation, so I am skeptical
> of everyone who claims that they "formalized the C standard" and built a tool on
> that without extensive evaluation of how their formalization compares to what
> compilers do and what programmers rely on. The Cerberus people have done that
> evaluation (see e.g. https://dl.acm.org/doi/10.1145/2980983.2908081), but none
> of the other efforts have (to my knowledge). Ideally such a formalization effort
> would be done in close collaboration with compiler authors and the committee so
> that the ambiguities in the standard can be resolved and the formalization
> becomes the one canonical interpretation. The Cerberus people are the ones that
> pushed the C provenance formalization through, so they made great progress here.
> However, many issues remain, some quite long-standing (e.g.
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm and
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm, which in my eyes
> never got properly resolved by clarifying the standard). Martin and a few others
> are slowly pushing things in the right direction, but it takes a long time.
> Rust, by having a single project in charge of the one canonical implementation
> and the specification, and having an open process that is well-suited for
> incorporating user concerns, can move a lot quicker here. C has a huge
> head-start, Rust has nothing like the C standard, but we are catching up -- and
> our goal is more ambitious than that; we are doing our best to learn from C and
> C++ and concluded that that style of specification is too prone to ambiguity, so
> we are trying to achieve a formally precise unambiguous specification. Wasm
> shows that this can be done, at industry scale, albeit for a small language --
> time we do it for a large one. :)
>
> So, yes I think Miri is fairly unique. But please let me know if I missed something!
>
> (As an aside, the above hopefully also explains why some people in Rust are
> concerned about alternative implementations. We do *not* want the current
> de-factor behavior to ossify and become the specification. We do *not* want the
> specification to just be a description of what the existing implementations at
> the time happen to do, and declare all behavior differences to be UB or
> unspecified or so just because no implementation is willing to adjust their
> behavior to match the rest. We want the specification to be prescriptive, not
> descriptive, and we will adjust the implementation as we see fit to achieve that
> -- within the scope of Rust's stability guarantees. That's also why we are so
> cagey about spelling out the aliasing rules until we are sure we have a good
> enough model.)

Very interesting, thank you for the exhaustive answer.

Might it be accurate to categorize Miri as a
"formal-semantics-based undefined-behavior-detecting interpreter"?

>https://godbolt.org/z/rjaWc6EzG

That example uses a compiler-specific attribute AFAIK, namely

    __attribute__((noinline))

When using compiler-specific attributes and options, the
original language is arguably no longer being used, depending
on the attribute. Though a language being inexpressive and
possibly requiring compiler extensions to achieve some goals,
possibly like in this C example, can be a disadvantage in itself.

> [On formalization]

I agree that Rust has some advantages in regards to formalization,
but some of them that I think of, are different from what you
mention here. And I also see some disadvantages.

C is an ancient language, and parsing and handling C is made
more complex by the preprocessor. Rust is a much younger
language that avoided all that pain, and is easier to parse
and handle. C++ is way worse, though might become closer
to Rust with C++ modules.

Rust is more willing to break existing code in projects, causing
previously compiling projects to no longer compile. rustc does this
rarely, but it has happened, also long after Rust 1.0.

>From last year, 2024.

    https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374
        "Rust 1.80 broke builds of almost all versions of the
        very popular time crate (edit: please don't shoot the
        messenger in that GitHub thread!!!)

        Rust has left only a 4-month old version working.
        That was not enough time for the entire Rust
        ecosystem to update, especially that older
        versions of time were not yanked, and users
        had no advance warning that it will stop working.

        A crater run found a regression in over 5000 crates,
        and that has just been accepted as okay without
        any further action! This is below the level of stability
        and reliability that Rust should have."

If C was willing to break code as much as Rust, it would be easier to
clean up C.

There is the Rust feature "editions", which is interesting,
but in my opinion also very experimental from a
programming language theory perspective. It does
help avoid breakage while letting the languages developers
clean up the language and improve it, but has some other
consequences, such as source code having different
semantics in different editions. Automated upgrade
tools help with this, but does not handle all consequences.

If C was made from scratch today, by experts at type theory,
then C would likely have a much simpler type system and type
checking than Rust, and would likely be much easier to formalize.
Programs in C would likely still often be more complex than
in C++ or Rust, however.

>[Omitted] We do *not* want the
> specification to just be a description of what the existing implementations at
> the time happen to do, and declare all behavior differences to be UB or
> unspecified or so just because no implementation is willing to adjust their
> behavior to match the rest. [Omitted]

I have seen some Rust proponents literally say that there is
a specification for Rust, and that it is called rustc/LLVM.
Though those specific individuals may not have been the
most credible individuals.

A fear I have is that there may be hidden reliance in
multiple different ways on LLVM, as well as on rustc.
Maybe even very deeply so. The complexity of Rust's
type system and rustc's type system checking makes
me more worried about this point. If there are hidden
elements, they may turn out to be very difficult to fix,
especially if they are discovered to be fundamental.
While having one compiler can be an advantage in
some ways, it can arguably be a disadvantage
in some other ways, as you acknowledge as well
if I understand you correctly.

You mention ossifying, but the more popular Rust becomes,
the more painful breakage will be, and the less suited
Rust will be as a research language.

Using Crater to test existing Rust projects with, as you
mention later in your email, is an interesting and
possibly very valuable approach, but I do not know
its limitations and disadvantages. Some projects
will be closed source, and thus will presumably
not be checked, as I understand it.

Does Crater run Rust for Linux and relevant Rust
kernel code?

I hope that any new language at least has its
language developers ensure that they have a type
system that is formalized and proven correct
before that langauge's 1.0 release.
Since fixing a type system later can be difficult or
practically impossible. A complex type system
and complex type checking can be a larger risk in this
regard relative to a simple type system and simple
type checking, especially the more time passes and
the more the language is used and have code
written in it, making it more difficult to fix the language
due to code breakage costing more.

Some languages that broke backwards compatibility
arguably suffered or died because of it, like Perl 6
or Scala 3. Python 2 to 3 was arguably successful but painful.
Scala 3 even had automated conversion tools AFAIK.

> > There are some issues in Rust that I am curious as to
> > your views on. rustc or the Rust language has some type
> > system holes, which still causes problems for rustc and
> > their developers.
> >
> >      https://github.com/lcnr/solver-woes/issues/1
> >      https://github.com/rust-lang/rust/issues/75992
> >
> > Those kinds of issues seem difficult to solve.
> >
> > In your opinion, is it accurate to say that the Rust language
> > developers are working on a new type system for
> > Rust-the-language and a new solver for rustc, and that
> > they are trying to make the new type system and new solver
> > as backwards compatible as possible?
>
> It's not really a new type system. It's a new implementation for the same type
> system. But yes there is work on a new "solver" (that I am not involved in) that
> should finally fix some of the long-standing type system bugs. Specifically,
> this is a "trait solver", i.e. it is the component responsible for dealing with
> trait constraints. Due to some unfortunate corner-case behaviors of the old,
> organically grown solver, it's very hard to do this in a backwards-compatible
> way, but we have infrastructure for extensive ecosystem-wide testing to judge
> the consequences of any given potential breaking change and ensure that almost
> all existing code keeps working. In fact, Rust 1.84 already started using the
> new solver for some things
> (https://blog.rust-lang.org/2025/01/09/Rust-1.84.0.html) -- did you notice?
> Hopefully not. :)

If it is not a new type system, why then do they talk about
backwards compatibility for existing Rust projects?
If the type system is not changed, existing projects would
still type check. And in this repository of one of the main
Rust language developers as I understand it, several
issues are labeled with "S-fear".

    https://github.com/lcnr/solver-woes/issues

They have also been working on this new solver for
several years. Reading through the issues, a lot of
the problems seem very hard.

Best, VJ.