[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8YMTiKS4T9wC4t_@boqun-archlinux>
Date: Mon, 3 Mar 2025 12:08:46 -0800
From: Boqun Feng <boqun.feng@...il.com>
To: Andreas Hindborg <a.hindborg@...nel.org>
Cc: Alice Ryhl <aliceryhl@...gle.com>,
Daniel Almeida <daniel.almeida@...labora.com>,
Benno Lossin <benno.lossin@...ton.me>,
Abdiel Janulgue <abdiel.janulgue@...il.com>, dakr@...nel.org,
robin.murphy@....com, rust-for-linux@...r.kernel.org,
Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Trevor Gross <tmgross@...ch.edu>,
Valentin Obst <kernel@...entinobst.de>,
linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>, airlied@...hat.com,
iommu@...ts.linux.dev, Ralf Jung <post@...fj.de>,
comex <comexk@...il.com>, lkmm@...ts.linux.dev
Subject: Re: Allow data races on some read/write operations
On Mon, Mar 03, 2025 at 08:00:03PM +0100, Andreas Hindborg wrote:
>
> [New subject, was: Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction]
>
> "Alice Ryhl" <aliceryhl@...gle.com> writes:
>
> > On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@...nel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@...gle.com> writes:
> >>
> >> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@...nel.org> wrote:
> >> >>
> >> >> "Daniel Almeida" <daniel.almeida@...labora.com> writes:
> >> >>
> >> >> > Hi Benno,
> >> >> >
> >> >>
> >> >> [...]
> >> >>
> >> >> >>> + /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> >> >> >>> + /// number of bytes.
> >> >> >>> + ///
> >> >> >>> + /// # Examples
> >> >> >>> + ///
> >> >> >>> + /// ```
> >> >> >>> + /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> >> >> >>> + /// let somedata: [u8; 4] = [0xf; 4];
> >> >> >>> + /// let buf: &[u8] = &somedata;
> >> >> >>> + /// alloc.write(buf, 0)?;
> >> >> >>> + /// # Ok::<(), Error>(()) }
> >> >> >>> + /// ```
> >> >> >>> + pub fn write(&self, src: &[T], offset: usize) -> Result {
> >> >> >>> + let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> >> >> >>> + if end >= self.count {
> >> >> >>> + return Err(EINVAL);
> >> >> >>> + }
> >> >> >>> + // SAFETY:
> >> >> >>> + // - The pointer is valid due to type invariant on `CoherentAllocation`
> >> >> >>> + // and we've just checked that the range and index is within bounds.
> >> >> >>> + // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
> >> >> >>> + // that `self.count` won't overflow early in the constructor.
> >> >> >>> + unsafe {
> >> >> >>> + core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> >> >> >>
> >> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
> >> >> >
> >> >> > Sorry, can you rephrase this question?
> >> >>
> >> >> This write is suffering the same complications as discussed here [1].
> >> >> There are multiple issues with this implementation.
> >> >>
> >> >> 1) `write` takes a shared reference and thus may be called concurrently.
> >> >> There is no synchronization, so `copy_nonoverlapping` could be called
> >> >> concurrently on the same address. The safety requirements for
> >> >> `copy_nonoverlapping` state that the destination must be valid for
> >> >> write. Alice claims in [1] that any memory area that experience data
> >> >> races are not valid for writes. So the safety requirement of
> >> >> `copy_nonoverlapping` is violated and this call is potential UB.
> >> >>
> >> >> 2) The destination of this write is DMA memory. It could be concurrently
> >> >> modified by hardware, leading to the same issues as 1). Thus the
> >> >> function cannot be safe if we cannot guarantee hardware will not write
> >> >> to the region while this function is executing.
> >> >>
> >> >> Now, I don't think that these _should_ be issues, but according to our
> >> >> Rust language experts they _are_.
> >> >>
> >> >> I really think that copying data through a raw pointer to or from a
> >> >> place that experiences data races, should _not_ be UB if the data is not
> >> >> interpreted in any way, other than moving it.
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Andreas Hindborg
> >> >
> >> > We need to make progress on this series, and it's starting to get late
> >> > in the cycle. I suggest we:
> >>
> >> There is always another cycle.
> >>
> >> >
> >> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
> >> > 2. Change field_read/field_write to use a volatile read/write.
> >>
> >> Volatile reads/writes that race are OK?
> >
> > I will not give a blanket yes to that. If you read their docs, you
> > will find that they claim to not allow it. But they are the correct
> > choice for DMA memory, and there's no way in practice to get
> > miscompilations on memory locations that are only accessed with
> > volatile operations, and never have references to them created.
> >
> > In general, this will fall into the exception that we've been given
> > from the Rust people. In cases such as this where the Rust language
> > does not give us the operation we want, do it like you do in C. Since
> > Rust uses LLVM which does not miscompile the C part of the kernel, it
> > should not miscompile the Rust part either.
>
> This exception we got for `core::ptr::{read,write}_volatile`, did we
> document that somewhere?
>
[Cc Ralf, comex and LKMM list]
Some related discussions:
* https://github.com/rust-lang/unsafe-code-guidelines/issues/476
* https://github.com/rust-lang/unsafe-code-guidelines/issues/348#issuecomment-1221376388
particularly Ralf's comment on comex's message:
"""
@comex
> First, keep in mind that you could simply transliterate the C
> versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust,
> using ptr::read_volatile/ptr::write_volatile in place of C volatile
> loads and stores, and asm! in place of C asm blocks. If you do,
> you'll end up with the same LLVM IR instructions (or GCC equivalent
> with rustc_codegen_gcc), which will get passed to the same
> optimizer, and which ultimately will work or not work to the same
> extent as the C versions.
Indeed I think that is probably the best approach.
"""
* A LONG thread of the discussion:
https://rust-lang.zulipchat.com/#narrow/channel/136281-t-opsem/topic/UB.20caused.20by.20races.20on.20.60.7Bread.2Cwrite.7D_volatile.60/near/399343771
In general, the rationale is if Rust code could generate the same LLVM
IR as C code, then if it's not data race per LKMM, then it's not treated
as data race in Rust as well. But this is not a "get-out-of-UB" free
card IMO:
* If both sides of the racing are Rust code, we should avoid using
{read,write}_volatile(), and use proper synchronization.
* If atomicity is also required, we should use Atomic::from_ptr()
instead of {read,write}_volatile().
> I feel slightly lost when trying to figure out what fits under this
> exception and what is UB. I think that fist step to making this more
> straight forward is having clear documentation.
>
I agree, and I'm happy to help on this.
> For cases where we need to do the equivalent of `memmove`/`memcpy`, what
> are is our options?
>
Seems we need "volatile" memmove and memcpy in Rust?
> In case we have no options, do you know who would be the right people on
> the Rust Project side to contact about getting an exception for this
> case?
>
I will say it'll be t-opsem.
Regards,
Boqun
>
> Best regards,
> Andreas Hindborg
>
>
Powered by blists - more mailing lists