[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aHT1etvG0R648EB9@google.com>
Date: Mon, 14 Jul 2025 12:18:02 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Andreas Hindborg <a.hindborg@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Alexander Viro <viro@...iv.linux.org.uk>,
Arnd Bergmann <arnd@...db.de>, Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Gary Guo <gary@...yguo.net>,
"Björn Roy Baron" <bjorn3_gh@...tonmail.com>, Trevor Gross <tmgross@...ch.edu>,
Danilo Krummrich <dakr@...nel.org>, Matthew Maurer <mmaurer@...gle.com>, Lee Jones <lee@...nel.org>,
linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
Benno Lossin <lossin@...nel.org>
Subject: Re: [PATCH v2 1/4] rust: iov: add iov_iter abstractions for ITER_SOURCE
On Wed, Jul 09, 2025 at 07:05:01PM +0200, Andreas Hindborg wrote:
> "Alice Ryhl" <aliceryhl@...gle.com> writes:
>
> > On Wed, Jul 09, 2025 at 01:56:37PM +0200, Andreas Hindborg wrote:
> >> "Alice Ryhl" <aliceryhl@...gle.com> writes:
> >>
> >> > On Tue, Jul 08, 2025 at 04:45:14PM +0200, Andreas Hindborg wrote:
> >> >> "Alice Ryhl" <aliceryhl@...gle.com> writes:
> >> >> > +/// # Invariants
> >> >> > +///
> >> >> > +/// Must hold a valid `struct iov_iter` with `data_source` set to `ITER_SOURCE`. For the duration
> >> >> > +/// of `'data`, it must be safe to read the data in this IO vector.
> >> >>
> >> >> In my opinion, the phrasing you had in v1 was better:
> >> >>
> >> >> The buffers referenced by the IO vector must be valid for reading for
> >> >> the duration of `'data`.
> >> >>
> >> >> That is, I would prefer "must be valid for reading" over "it must be
> >> >> safe to read ...".
> >> >
> >> > If it's backed by userspace data, then technically there aren't any
> >> > buffers that are valid for reading in the usual sense. We need to call
> >> > into special assembly to read it, and a normal pointer dereference would
> >> > be illegal.
> >>
> >> If you go with "safe to read" for this reason, I think you should expand
> >> the statement along the lines you used here.
> >>
> >> What is the special assembly that is used to read this data? From a
> >> quick scan it looks like that if `CONFIG_UACCESS_MEMCPY` is enabled, a
> >> regular `memcpy` call is used.
> >
> > When reading from userspace, you're given an arbitrary untrusted address
> > that could point anywhere. The memory could be swapped out and need to
> > be loaded back from disk. The memory could correspond to an mmap region
> > for a file on a NFS mount and reading it could involve a network call.
> > The address could be dangling, which must be properly handled and turned
> > into an EFAULT error instead of UB. Every architecture has its own asm
> > for handling all of this safely so that behavior is safe no matter what
> > pointer we are given from userspace.
>
> I don't think that is relevant. My point is, you can't reference
> "special assemby" without detailing what that means.
>
> You have a safety requirement in `from_raw`:
>
> /// * For the duration of `'data`, the buffers backing this IO vector must be valid for
> /// reading.
>
> This should probably be promoted to invariant for the type, since
> `from_raw` is the only way to construct the type?
Sure, let's get the wording consistent, but that was the purpose of this
line in the invariants:
For the duration of `'data`, it must be safe to read the data in this IO vector.
> But are you saying that the referenced buffers need not be mapped and
> readable while this type exists? The mapping happens as part of
> `bindings::_copy_to_iter`?
Ultimately, it's an implementation detail.
In our "# Invariants" section, we tend to "expand" the underlying C
types and describe exactly what it means for that C type to be valid,
even if those details are implementation details that nobody outside
that C file should think about. Usually that's fine, but in this case,
I don't think it is feasible.
The iov_iter type is like a giant enum with a bunch of different
implementations. Some implementations just read from a simple kernel
buffer that must, of course, be mapped. Some implementations traverse
complex data structures and stitch the data together from multiple
buffers. Other implementations map the data into memory on-demand inside
the copy_from_iter call, without requiring it to be mapped at other
times. And finally, some implementations perform IO by reading from
userspace, in which case it's valid for the userspace pointer to be
*literally any 64-bit integer*. If the address is dangling, that's
caught inside the call to copy_from_iter and is not a safety issue.
I just want the type invariant to say that reading from it is valid, as
long as the read happens before a certain lifetime expires, without
elaborating on precisely what that means.
> >> > * If the iov_iter reads from a kernel buffer, then the creator of the
> >> > iov_iter must provide an initialized buffer.
> >> >
> >> > Ultimately, if we don't know that the bytes are initialized, then it's
> >> > impossible to use the API correctly because you can never inspect the
> >> > bytes in any way. I.e., any implementation of copy_from_iter that
> >> > produces uninit data is necessarily buggy.
> >>
> >> I would agree. How do we fix that? You are more knowledgeable than me in
> >> this field, so you probably have a better shot than me, at finding a
> >> solution.
> >
> > I think there is nothing to fix. If there exists a callsite on the C
> > side that creates an iov_iter that reads from an uninitialized kernel
> > buffer, then we can fix that specific call-site. I don't think anything
> > else needs to be done.
>
> If soundness of this code hinges on specific call site behavior, this
> should be a safety requirement.
Yes, when we add Rust constructors for this type, they will need
appropriate soundness checks or safety requirements to verify that the
provided buffer is valid for the chosen iter_type.
For now, it is constructed in C and we usually don't have safety
comments in C code.
> >> As far as I can tell, we need to read from a place unknown to the rust
> >> abstract machine, and we need to be able to have the abstract machine
> >> consider the data initialized after the read.
> >>
> >> Is this volatile memcpy [1], or would that only solve the data race
> >> problem, not uninitialized data problem?
> >>
> >> [1] https://lore.kernel.org/all/25e7e425-ae72-4370-ae95-958882a07df9@ralfj.de
> >
> > Volatile memcpy deals with data races.
> >
> > In general, we can argue all we want about wording of these safety
> > comments, but calling copy_from_iter is the right way to read from an
> > iov_iter. If there is a problem, the problem is specific call-sites that
> > construct an iov_iter with an uninit buffer. I don't know whether such
> > call-sites exist.
>
> I am not saying it is the wrong way. I am asking that we detail in the
> safety requirements _why_ it is the right way.
>
> You have a type invariant
>
> For the duration of `'data`, it must be safe to read the data in this IO vector.
>
> that says "safe to read" instead of "valid for read" because "special
> assembly" is used to read the data, and that somehow makes it OK. We
> should be more specific.
>
> How about making the invariant:
>
> For the duration of `'data`, it must be safe to read the data in this
> IO vector with the C API `_copy_from_iter`.
>
> And then your safety comment regarding uninit bytes can be:
>
> We write `out` with `copy_from_iter_raw`, which transitively writes
> `out` using `_copy_from_iter`. By C API contract, `_copy_from_iter`
> does not write uninitialized bytes to `out`.
>
> In this way we can defer to the implementation of `_copy_from_user`,
> which is what I think you want?
Yes, this is pretty much what I want except that _copy_from_user isn't
the only C function you could call to read from an iov_iter.
Alice
Powered by blists - more mailing lists