[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d7363b0-785c-4101-8047-27cb7afb0364@ralfj.de>
Date: Wed, 26 Feb 2025 14:54:07 +0100
From: Ralf Jung <post@...fj.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Alice Ryhl <aliceryhl@...gle.com>
Cc: Ventura Jack <venturajack85@...il.com>,
Kent Overstreet <kent.overstreet@...ux.dev>, Gary Guo <gary@...yguo.net>,
airlied@...il.com, boqun.feng@...il.com, david.laight.linux@...il.com,
ej@...i.de, gregkh@...uxfoundation.org, hch@...radead.org, hpa@...or.com,
ksummit@...ts.linux.dev, linux-kernel@...r.kernel.org,
miguel.ojeda.sandonis@...il.com, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)
Hi all,
>> I think all of this worrying about Rust not having defined its
>> aliasing model is way overblown. Ultimately, the status quo is that
>> each unsafe operation that has to do with aliasing falls into one of
>> three categories:
>>
>> * This is definitely allowed.
>> * This is definitely UB.
>> * We don't know whether we want to allow this yet.
>
> Side note: can I please ask that the Rust people avoid the "UD" model
> as much as humanly possible?
>
> In particular, if there is something that is undefined behavior - even
> if it's in some "unsafe" mode, please please please make the rule be
> that
>
> (a) either the compiler ends up being constrained to doing things in
> some "naive" code generation
>
> or it's a clear UB situation, and
>
> (b) the compiler will warn about it
That would be lovely, wouldn't it?
Sadly, if you try to apply this principle at scale in a compiler that does
non-trivial optimizations, it is very unclear what this would even mean. I am
not aware of any systematic/rigorous description of compiler correctness in the
terms you are suggesting here. The only approach we know that we can actually
pull through systematically (in the sense of "at least in principle, we can
formally prove this correct") is to define the "visible behavior" of the source
program, the "visible behavior" of the generated assembly, and promise that they
are the same. (Or, more precisely, that the latter is a refinement of the
former.) So the Rust compiler promises nothing about the shape of the assembly
you will get, only about its "visible" behavior (and which exact memory access
occurs when is generally not considered "visible").
There is a *long* list of caveats here for things like FFI, volatile accesses,
and inline assembly. It is possible to deal with them systematically in this
framework, but spelling this out here would take too long. ;)
Once you are at a level of "visible behavior", there are a bunch of cases where
UB is the only option. The most obvious ones are out-of-bounds writes, and
calling a function pointer that doesn't point to valid code with the right ABI
and signature. There's just no way to constrain the effect on program behavior
that such an operation can have.
We also *do* want to let programmers explicitly tell the compiler "this code
path is unreachable, please just trust me on this and use that information for
your optimizations". This is a pretty powerful and useful primitive and gives
rise to things like unwrap_unchecked in Rust.
So our general stance in Rust is that we minimize as much as we can the cases
where there is UB. We avoid gratuitous UB e.g. for integer overflow or sequence
point violations. We guarantee there is no UB in entirely safe code. We provide
tooling, documentation, and diagnostics to inform programmers about UB and help
them understand what is and is not UB. (We're always open to suggestions for
better diagnostics.)
But if a program does have UB, then all bets are indeed off. We see UB as a
binding contract between programmer and compiler: the programmer promises to
never cause UB, the compiler in return promises to generate code whose "visible
behavior" matches that of the source program. There's a very pragmatic reason
for that (it's how LLVM works, and Rust wouldn't be where it is without LLVM
proving that it can compete with C/C++ on performance), but there's also the
reason mentioned above that it is not at all clear what the alternative would
actually look like, once you dig into it systematically (short of "don't
optimize unsafe code", which most people using unsafe for better performance
would dislike very much -- and "better performance" is one of the primary
reasons people reach for unsafe Rust).
In other words, in my view it's not the "unconstrained UB" model that is wrong
with C, it is *how easy* it is to accidentally make a promise to the compiler
that you cannot actually uphold. Having every single (signed) addition be a
binding promise is a disaster, of course nobody can keep up with all those
promises. Having an explicit "add_unchecked" be a promise is entirely fine and
there are cases where this can help generate a lot better code.
Having the use of an "&mut T" or "&T" reference be a promise is certainly more
subtle, and maybe too subtle, but my understanding is that the performance wins
from those assumptions even just on the Rust compiler itself are substantial.
Kind regards,
Ralf
>
> IOW, *please* avoid the C model of "Oh, I'll generate code that
> silently takes advantage of the fact that if I'm wrong, this case is
> undefined".
>
> And BTW, I think this is _particularly_ true for unsafe rust. Yes,
> it's "unsafe", but at the same time, the unsafe parts are the fragile
> parts and hopefully not _so_ hugely performance-critical that you need
> to do wild optimizations.
>
> So the cases I'm talking about is literally re-ordering accesses past
> each other ("Hey, I don't know if these alias or not, but based on
> some paper standard - rather than the source code - I will assume they
> do not"), and things like integer overflow behavior ("Oh, maybe this
> overflows and gives a different answer than the naive case that the
> source code implies, but overflow is undefined so I can screw it up").
>
> I'd just like to point to one case where the C standards body seems to
> have actually at least consider improving on undefined behavior (so
> credit where credit is due, since I often complain about the C
> standards body):
>
> https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n3203.htm
>
> where the original "this is undefined" came from the fact that
> compilers were simple and restricting things like evaluation order
> caused lots of problems. These days, a weak ordering definition causes
> *many* more problems, and compilers are much smarter, and just saying
> that the code has to act as if there was a strict ordering of
> operations still allows almost all the normal optimizations in
> practice.
>
> This is just a general "please avoid the idiocies of the past". The
> potential code generation improvements are not worth the pain.
>
> Linus
>
Powered by blists - more mailing lists