[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ecuplgqy.fsf@kernel.org>
Date: Wed, 09 Jul 2025 13:56:37 +0200
From: Andreas Hindborg <a.hindborg@...nel.org>
To: "Alice Ryhl" <aliceryhl@...gle.com>
Cc: "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>, "Alexander Viro"
<viro@...iv.linux.org.uk>, "Arnd Bergmann" <arnd@...db.de>, "Miguel
Ojeda" <ojeda@...nel.org>, "Boqun Feng" <boqun.feng@...il.com>, "Gary
Guo" <gary@...yguo.net>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>,
"Trevor Gross" <tmgross@...ch.edu>, "Danilo Krummrich"
<dakr@...nel.org>, "Matthew Maurer" <mmaurer@...gle.com>, "Lee Jones"
<lee@...nel.org>, <linux-kernel@...r.kernel.org>,
<rust-for-linux@...r.kernel.org>, "Benno Lossin" <lossin@...nel.org>
Subject: Re: [PATCH v2 1/4] rust: iov: add iov_iter abstractions for
ITER_SOURCE
"Alice Ryhl" <aliceryhl@...gle.com> writes:
> On Tue, Jul 08, 2025 at 04:45:14PM +0200, Andreas Hindborg wrote:
>> "Alice Ryhl" <aliceryhl@...gle.com> writes:
>> > +/// # Invariants
>> > +///
>> > +/// Must hold a valid `struct iov_iter` with `data_source` set to `ITER_SOURCE`. For the duration
>> > +/// of `'data`, it must be safe to read the data in this IO vector.
>>
>> In my opinion, the phrasing you had in v1 was better:
>>
>> The buffers referenced by the IO vector must be valid for reading for
>> the duration of `'data`.
>>
>> That is, I would prefer "must be valid for reading" over "it must be
>> safe to read ...".
>
> If it's backed by userspace data, then technically there aren't any
> buffers that are valid for reading in the usual sense. We need to call
> into special assembly to read it, and a normal pointer dereference would
> be illegal.
If you go with "safe to read" for this reason, I think you should expand
the statement along the lines you used here.
What is the special assembly that is used to read this data? From a
quick scan it looks like that if `CONFIG_UACCESS_MEMCPY` is enabled, a
regular `memcpy` call is used.
>
>> > + /// Returns the number of bytes available in this IO vector.
>> > + ///
>> > + /// Note that this may overestimate the number of bytes. For example, reading from userspace
>> > + /// memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > + #[inline]
>> > + pub fn len(&self) -> usize {
>> > + // SAFETY: It is safe to access the `count` field.
>>
>> Reiterating my comment from v1: Why?
>
> It's the same reason as why this is safe:
>
> struct HasLength {
> length: usize,
> }
> impl HasLength {
> fn len(&self) -> usize {
> // why is this safe?
> self.length
> }
> }
>
> I'm not sure how to say it concisely. I guess it's because all access to
> the iov_iter goes through the &IovIterSource.
So "By existence of a shared reference to `self`, `count` is valid for read."?
>
>> > + unsafe {
>> > + (*self.iov.get())
>> > + .__bindgen_anon_1
>> > + .__bindgen_anon_1
>> > + .as_ref()
>> > + .count
>> > + }
>> > + }
>> > +
>> > + /// Returns whether there are any bytes left in this IO vector.
>> > + ///
>> > + /// This may return `true` even if there are no more bytes available. For example, reading from
>> > + /// userspace memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > + #[inline]
>> > + pub fn is_empty(&self) -> bool {
>> > + self.len() == 0
>> > + }
>> > +
>> > + /// Advance this IO vector by `bytes` bytes.
>> > + ///
>> > + /// If `bytes` is larger than the size of this IO vector, it is advanced to the end.
>> > + #[inline]
>> > + pub fn advance(&mut self, bytes: usize) {
>> > + // SAFETY: `self.iov` is a valid IO vector.
>> > + unsafe { bindings::iov_iter_advance(self.as_raw(), bytes) };
>> > + }
>> > +
>> > + /// Advance this IO vector backwards by `bytes` bytes.
>> > + ///
>> > + /// # Safety
>> > + ///
>> > + /// The IO vector must not be reverted to before its beginning.
>> > + #[inline]
>> > + pub unsafe fn revert(&mut self, bytes: usize) {
>> > + // SAFETY: `self.iov` is a valid IO vector, and `bytes` is in bounds.
>> > + unsafe { bindings::iov_iter_revert(self.as_raw(), bytes) };
>> > + }
>> > +
>> > + /// Read data from this IO vector.
>> > + ///
>> > + /// Returns the number of bytes that have been copied.
>> > + #[inline]
>> > + pub fn copy_from_iter(&mut self, out: &mut [u8]) -> usize {
>> > + // SAFETY: We will not write uninitialized bytes to `out`.
>>
>> Can you provide something to back this claim?
>
> I guess the logic could go along these lines:
>
> * If the iov_iter reads from userspace, then it's because we always
> consider such reads to produce initialized data.
I don't think it is enough to just state that we consider the reads to
produce initialized data.
> * If the iov_iter reads from a kernel buffer, then the creator of the
> iov_iter must provide an initialized buffer.
>
> Ultimately, if we don't know that the bytes are initialized, then it's
> impossible to use the API correctly because you can never inspect the
> bytes in any way. I.e., any implementation of copy_from_iter that
> produces uninit data is necessarily buggy.
I would agree. How do we fix that? You are more knowledgeable than me in
this field, so you probably have a better shot than me, at finding a
solution.
As far as I can tell, we need to read from a place unknown to the rust
abstract machine, and we need to be able to have the abstract machine
consider the data initialized after the read.
Is this volatile memcpy [1], or would that only solve the data race
problem, not uninitialized data problem?
Best regards,
Andreas Hindborg
[1] https://lore.kernel.org/all/25e7e425-ae72-4370-ae95-958882a07df9@ralfj.de
Powered by blists - more mailing lists