[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DG6B5Q4FJTYT.32ZVBMHJ6OR3G@garyguo.net>
Date: Wed, 04 Feb 2026 16:12:05 +0000
From: "Gary Guo" <gary@...yguo.net>
To: "Andreas Hindborg" <a.hindborg@...nel.org>, "Boqun Feng"
<boqun@...nel.org>
Cc: "Gary Guo" <gary@...yguo.net>, "Alice Ryhl" <aliceryhl@...gle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@...cle.com>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, "Miguel Ojeda" <ojeda@...nel.org>, "Boqun Feng"
<boqun.feng@...il.com>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Trevor
Gross" <tmgross@...ch.edu>, "Danilo Krummrich" <dakr@...nel.org>,
<linux-mm@...ck.org>, <rust-for-linux@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] rust: page: add volatile memory copy methods
On Wed Feb 4, 2026 at 1:16 PM GMT, Andreas Hindborg wrote:
> Boqun Feng <boqun@...nel.org> writes:
>
>> On Sat, Jan 31, 2026 at 10:31:13PM +0100, Andreas Hindborg wrote:
>> [...]
>>> >>>>
>>> >>>> For __user memory, because kernel is only given a userspace address, and
>>> >>>> userspace can lie or unmap the address while kernel accessing it,
>>> >>>> copy_{from,to}_user() is needed to handle page faults.
>>> >>>
>>> >>> Just to clarify, for my use case, the page is already mapped to kernel
>>> >>> space, and it is guaranteed to be mapped for the duration of the call
>>> >>> where I do the copy. Also, it _may_ be a user page, but it might not
>>> >>> always be the case.
>>> >>
>>> >> In that case you should also assume there might be other kernel-space users.
>>> >> Byte-wise atomic memcpy would be best tool.
>>> >
>>> > Other concurrent kernel readers/writers would be a kernel bug in my use
>>> > case. We could add this to the safety requirements.
>>> >
>>>
>>> Actually, one case just crossed my mind. I think nothing will prevent a
>>> user space process from concurrently submitting multiple reads to the
>>> same user page. It would not make sense, but it can be done.
>>>
>>> If the reads are issued to different null block devices, the null block
>>> driver might concurrently write the user page when servicing each IO
>>> request concurrently.
>>>
>>> The same situation would happen in real block device drivers, except the
>>> writes would be done by dma engines rather than kernel threads.
>>>
>>
>> Then we better use byte-wise atomic memcpy, and I think for all the
>> architectures that Linux kernel support, memcpy() is in fact byte-wise
>> atomic if it's volatile. Because down the actual instructions, either a
>> byte-size read/write is used, or a larger-size read/write is used but
>> they are guaranteed to be byte-wise atomic even for unaligned read or
>> write. So "volatile memcpy" and "volatile byte-wise atomic memcpy" have
>> the same implementation.
>>
>> (The C++ paper [1] also says: "In fact, we expect that existing assembly
>> memcpy implementations will suffice when suffixed with the required
>> fence.")
>>
>> So to make thing move forward, do you mind to introduce a
>> `atomic_per_byte_memcpy()` in rust::sync::atomic based on
>> bindings::memcpy(), and cc linux-arch and all the archs that support
>> Rust for some confirmation? Thanks!
>
> There is a few things I do not fully understand:
>
> - Does the operation need to be both atomic and volatile, or is atomic enough on its
> own (why)?
In theory, C11 atomic (without using volatile keyword) and Rust atomic are not
volatile, so compiler can optimize them, e.g. coalesce two relaxed read of the
same address into one. In practice, no compiler is doing this. LKMM atomics are
always volatile.
> - The article you reference has separate `atomic_load_per_byte_memcpy`
> and `atomic_store_per_byte_memcpy` that allows inserting an acquire
> fence before the load and a release fence after the store. Do we not
> need that?
It's distinct so that the semantics on the ordering is clear, as the "acquire"
or "release" order is for the atomic argument, and there's no ordering for the
other argument.
Another thing is that without two methods, you need an extra conversion to
convert a slice to non-atomic slice, which is not generally sound. (I.e. you
cannot turn &[u8] to &[Atomic<u8>], as doing so would give you the ability to
write to immutable memory.
> - It is unclear to me how to formulate the safety requirements for
> `atomic_per_byte_memcpy`. In this series, one end of the operation is
> the potential racy area. For `atomic_per_byte_memcpy` it could be
> either end (or both?). Do we even mention an area being "outside the
> Rust AM"?
No, atomics are inside the AM. A piece of memory is either in AM or outside. For
a page that both kernel and userspace access, we should just treat it as other
memory and treat userspace as an always-atomic user.
>
> First attempt below. I am quite uncertain about this. I feel like we
> have two things going on: Potential races with other kernel threads,
> which we solve by saying all accesses are byte-wise atomic, and reaces
> with user space processes, which we solve with volatile semantics?
>
> Should the functin name be `volatile_atomic_per_byte_memcpy`?
>
> /// Copy `len` bytes from `src` to `dst` using byte-wise atomic operations.
> ///
> /// This copy operation is volatile.
> ///
> /// # Safety
> ///
> /// Callers must ensure that:
> ///
> /// * The source memory region is readable and reading from the region will not trap.
We should just use standard terminology here, similar to Atomic::from_ptr.
> /// * The destination memory region is writable and writing to the region will not trap.
> /// * No references exist to the source or destination regions.
> /// * If the source or destination region is within the Rust AM, any concurrent reads or writes to
> /// source or destination memory regions by the Rust AM must use byte-wise atomic operations.
This should be dropped.
> pub unsafe fn atomic_per_byte_memcpy(src: *const u8, dst: *mut u8, len: usize) {
> // SAFETY: By the safety requirements of this function, the following operation will not:
> // - Trap.
> // - Invalidate any reference invariants.
> // - Race with any operation by the Rust AM, as `bindings::memcpy` is a byte-wise atomic
> // operation and all operations by the Rust AM use byte-wise atomic semantics.
> //
> // Further, as `bindings::memcpy` is a volatile operation, the operation will not race with any
> // read or write operation to the source or destination area if the area can be considered to
> // be outside the Rust AM.
> unsafe { bindings::memcpy(dst.cast::<kernel::ffi::c_void>(), src.cast::<kernel::ffi::c_void>(), len) };
The `cast()` don't need explicit types I think?
Best,
Gary
> }
>
>
> Best regards,
> Andreas Hindborg
Powered by blists - more mailing lists