[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d29ebda1-e6ca-455d-af07-ac1daf84a3d2@ralfj.de>
Date: Thu, 27 Feb 2025 18:58:22 +0100
From: Ralf Jung <post@...fj.de>
To: Ventura Jack <venturajack85@...il.com>
Cc: Kent Overstreet <kent.overstreet@...ux.dev>,
Miguel Ojeda <miguel.ojeda.sandonis@...il.com>, Gary Guo <gary@...yguo.net>,
torvalds@...ux-foundation.org, airlied@...il.com, boqun.feng@...il.com,
david.laight.linux@...il.com, ej@...i.de, gregkh@...uxfoundation.org,
hch@...radead.org, hpa@...or.com, ksummit@...ts.linux.dev,
linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)
Hi VJ,
>> I am not aware of a comparable tool that would be in wide-spread use, or that is
>> carefully aligned with the semantics of an actual compiler.
>> For C, there is Cerberus (https://www.cl.cam.ac.uk/~pes20/cerberus/) as an
>> executable version of the C specification, but it can only run tiny examples.
>> The verified CompCert compiler comes with a semantics one could interpret, but
>> that only checks code for compatibility with CompCert C, which has a lot less
>> (and a bit more) UB than real C.
>> There are also two efforts that turned into commercial tools that I have not
>> tried, and for which there is hardly any documentation of how they interpret the
>> C standard so it's not clear what a green light from them means when compiling
>> with gcc or clang. I also don't know how much real-world code they can actually run.
>> - TrustInSoft/tis-interpreter, mostly gone from the web but still available in
>> the wayback machine
>> (https://web.archive.org/web/20200804061411/https://github.com/TrustInSoft/tis-interpreter/);
>> I assume this got integrated into their "TrustInSoft Analyzer" product.
>> - kcc, a K-framework based formalization of C that is executable. The public
>> repo is dead (https://github.com/kframework/c-semantics) and when I tried to
>> build their tool that didn't work. The people behind this have a company that
>> offers "RV-Match" as a commercial product claiming to find bugs in C based on "a
>> complete formal ISO C11 semantics" so I guess that is where their efforts go now.
>>
>> For C++ and Zig, I am not aware of anything comparable.
>>
>> Part of the problem is that in C, 2 people will have 3 ideas for what the
>> standard means. Compiler writers and programmers regularly have wildly
>> conflicting ideas of what is and is not allowed. There are many different places
>> in the standard that have to be scanned to answer "is this well-defined" even
>> for very simple programs. (https://godbolt.org/z/rjaWc6EzG is one of my favorite
>> examples.) A tool can check a single well-defined semantics, but who gets to
>> decide what exactly those semantics are?
>> Formalizing the C standard requires extensive interpretation, so I am skeptical
>> of everyone who claims that they "formalized the C standard" and built a tool on
>> that without extensive evaluation of how their formalization compares to what
>> compilers do and what programmers rely on. The Cerberus people have done that
>> evaluation (see e.g. https://dl.acm.org/doi/10.1145/2980983.2908081), but none
>> of the other efforts have (to my knowledge). Ideally such a formalization effort
>> would be done in close collaboration with compiler authors and the committee so
>> that the ambiguities in the standard can be resolved and the formalization
>> becomes the one canonical interpretation. The Cerberus people are the ones that
>> pushed the C provenance formalization through, so they made great progress here.
>> However, many issues remain, some quite long-standing (e.g.
>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm and
>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm, which in my eyes
>> never got properly resolved by clarifying the standard). Martin and a few others
>> are slowly pushing things in the right direction, but it takes a long time.
>> Rust, by having a single project in charge of the one canonical implementation
>> and the specification, and having an open process that is well-suited for
>> incorporating user concerns, can move a lot quicker here. C has a huge
>> head-start, Rust has nothing like the C standard, but we are catching up -- and
>> our goal is more ambitious than that; we are doing our best to learn from C and
>> C++ and concluded that that style of specification is too prone to ambiguity, so
>> we are trying to achieve a formally precise unambiguous specification. Wasm
>> shows that this can be done, at industry scale, albeit for a small language --
>> time we do it for a large one. :)
>>
>> So, yes I think Miri is fairly unique. But please let me know if I missed something!
>>
>> (As an aside, the above hopefully also explains why some people in Rust are
>> concerned about alternative implementations. We do *not* want the current
>> de-factor behavior to ossify and become the specification. We do *not* want the
>> specification to just be a description of what the existing implementations at
>> the time happen to do, and declare all behavior differences to be UB or
>> unspecified or so just because no implementation is willing to adjust their
>> behavior to match the rest. We want the specification to be prescriptive, not
>> descriptive, and we will adjust the implementation as we see fit to achieve that
>> -- within the scope of Rust's stability guarantees. That's also why we are so
>> cagey about spelling out the aliasing rules until we are sure we have a good
>> enough model.)
>
> Very interesting, thank you for the exhaustive answer.
>
> Might it be accurate to categorize Miri as a
> "formal-semantics-based undefined-behavior-detecting interpreter"?
Sure, why not. :)
>
>> https://godbolt.org/z/rjaWc6EzG
>
> That example uses a compiler-specific attribute AFAIK, namely
>
> __attribute__((noinline))
>
> When using compiler-specific attributes and options, the
> original language is arguably no longer being used, depending
> on the attribute. Though a language being inexpressive and
> possibly requiring compiler extensions to achieve some goals,
> possibly like in this C example, can be a disadvantage in itself.
That attribute just exists to make the example small and fit in a single file.
If you user multiple translation units, you can achieve the same effect without
the attribute. Anyway compilers promise (I hope^^) that that particular
attribute has no bearing on whether the code has UB. So, the question of whether
the program without the attribute has UB is still a very interesting one.
At least clang treats this code as having UB, and one can construct a similar
example for gcc. IMO this is not backed by the standard itself, though it can be
considered backed by some defect reports -- but those were for earlier versions
of the standard so technically, they do not apply to C23.
>> [On formalization]
>
> I agree that Rust has some advantages in regards to formalization,
> but some of them that I think of, are different from what you
> mention here. And I also see some disadvantages.
>
> C is an ancient language, and parsing and handling C is made
> more complex by the preprocessor. Rust is a much younger
> language that avoided all that pain, and is easier to parse
> and handle. C++ is way worse, though might become closer
> to Rust with C++ modules.
>
> Rust is more willing to break existing code in projects, causing
> previously compiling projects to no longer compile. rustc does this
> rarely, but it has happened, also long after Rust 1.0.
>
> From last year, 2024.
>
> https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374
> "Rust 1.80 broke builds of almost all versions of the
> very popular time crate (edit: please don't shoot the
> messenger in that GitHub thread!!!)
>
> Rust has left only a 4-month old version working.
> That was not enough time for the entire Rust
> ecosystem to update, especially that older
> versions of time were not yanked, and users
> had no advance warning that it will stop working.
>
> A crater run found a regression in over 5000 crates,
> and that has just been accepted as okay without
> any further action! This is below the level of stability
> and reliability that Rust should have."
>
> If C was willing to break code as much as Rust, it would be easier to
> clean up C.
Is that true? Gcc updates do break code.
>> [Omitted] We do *not* want the
>> specification to just be a description of what the existing implementations at
>> the time happen to do, and declare all behavior differences to be UB or
>> unspecified or so just because no implementation is willing to adjust their
>> behavior to match the rest. [Omitted]
>
> I have seen some Rust proponents literally say that there is
> a specification for Rust, and that it is called rustc/LLVM.
> Though those specific individuals may not have been the
> most credible individuals.
Maybe don't take the word of random Rust proponents on the internet as anything
more than that. :) I can't speak for the entire Rust project, but I can speak
as lead of the operational semantics team of the Rust project -- no, we do not
consider rustc/LLVM to be a satisfying spec. Producing a proper spec is on the
project agenda.
> A fear I have is that there may be hidden reliance in
> multiple different ways on LLVM, as well as on rustc.
> Maybe even very deeply so. The complexity of Rust's
> type system and rustc's type system checking makes
> me more worried about this point. If there are hidden
> elements, they may turn out to be very difficult to fix,
> especially if they are discovered to be fundamental.
> While having one compiler can be an advantage in
> some ways, it can arguably be a disadvantage
> in some other ways, as you acknowledge as well
> if I understand you correctly.
The Rust type system has absolutely nothing to do with LLVM. Those are
completely separate parts of the compiler. So I don't see any way that LLVM
could possibly influence our type system.
We already discussed previously that indeed, the Rust operational semantics has
a risk of overfitting to LLVM. I acknowledge that.
> You mention ossifying, but the more popular Rust becomes,
> the more painful breakage will be, and the less suited
> Rust will be as a research language.
I do not consider Rust a research language. :)
> Does Crater run Rust for Linux and relevant Rust
> kernel code?
Even better: every single change that lands in Rust checks Rust-for-Linux as
part of our CI.
> I hope that any new language at least has its
> language developers ensure that they have a type
> system that is formalized and proven correct
> before that langauge's 1.0 release.
> Since fixing a type system later can be difficult or
> practically impossible. A complex type system
> and complex type checking can be a larger risk in this
> regard relative to a simple type system and simple
> type checking, especially the more time passes and
> the more the language is used and have code
> written in it, making it more difficult to fix the language
> due to code breakage costing more.
Uff, that's a very high bar to pass.^^ I think there's maybe two languages ever
that meet this bar? SML and wasm.
>>> There are some issues in Rust that I am curious as to
>>> your views on. rustc or the Rust language has some type
>>> system holes, which still causes problems for rustc and
>>> their developers.
>>>
>>> https://github.com/lcnr/solver-woes/issues/1
>>> https://github.com/rust-lang/rust/issues/75992
>>>
>>> Those kinds of issues seem difficult to solve.
>>>
>>> In your opinion, is it accurate to say that the Rust language
>>> developers are working on a new type system for
>>> Rust-the-language and a new solver for rustc, and that
>>> they are trying to make the new type system and new solver
>>> as backwards compatible as possible?
>>
>> It's not really a new type system. It's a new implementation for the same type
>> system. But yes there is work on a new "solver" (that I am not involved in) that
>> should finally fix some of the long-standing type system bugs. Specifically,
>> this is a "trait solver", i.e. it is the component responsible for dealing with
>> trait constraints. Due to some unfortunate corner-case behaviors of the old,
>> organically grown solver, it's very hard to do this in a backwards-compatible
>> way, but we have infrastructure for extensive ecosystem-wide testing to judge
>> the consequences of any given potential breaking change and ensure that almost
>> all existing code keeps working. In fact, Rust 1.84 already started using the
>> new solver for some things
>> (https://blog.rust-lang.org/2025/01/09/Rust-1.84.0.html) -- did you notice?
>> Hopefully not. :)
>
> If it is not a new type system, why then do they talk about
> backwards compatibility for existing Rust projects?
If you make a tiny change to a type system, is it a "new type system"? "new type
system" sounds like "from-scratch redesign". That's not what happens.
> If the type system is not changed, existing projects would
> still type check. And in this repository of one of the main
> Rust language developers as I understand it, several
> issues are labeled with "S-fear".
>
> https://github.com/lcnr/solver-woes/issues
>
> They have also been working on this new solver for
> several years. Reading through the issues, a lot of
> the problems seem very hard.
It is hard, indeed. But last I knew, the types team is confident that they can
pull it off, and I have confidence in them.
Kind regards,
Ralf
Powered by blists - more mailing lists