[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z0WBXhyyE4ilLlCE@phenom.ffwll.local>
Date: Tue, 26 Nov 2024 09:05:50 +0100
From: Simona Vetter <simona.vetter@...ll.ch>
To: Benno Lossin <benno.lossin@...ton.me>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Andreas Hindborg <a.hindborg@...nel.org>,
Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
Greg KH <gregkh@...uxfoundation.org>, linux-kernel@...r.kernel.org,
rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v2 1/2] rust: add untrusted data abstraction
Sorry for the super late reply, was absorbed too much with life stuff :-/
On Mon, Sep 30, 2024 at 02:04:06PM +0000, Benno Lossin wrote:
> On 26.09.24 12:40, Simona Vetter wrote:
> > On Wed, Sep 25, 2024 at 08:53:05PM +0000, Benno Lossin wrote:
> >> + /// Sets the underlying untrusted value.
> >> + ///
> >> + /// # Examples
> >> + ///
> >> + /// ```
> >> + /// use kernel::validate::Untrusted;
> >> + ///
> >> + /// let mut untrusted = Untrusted::new(42);
> >> + /// untrusted.write(24);
> >> + /// ```
> >> + pub fn write(&mut self, value: impl Init<T>) {
> >> + let ptr: *mut T = &mut self.0 .0;
> >> + // SAFETY: `ptr` came from a mutable reference and the value is overwritten before it is
> >> + // read.
> >> + unsafe { ptr::drop_in_place(ptr) };
> >> + // SAFETY: `ptr` came from a mutable reference and the initializer cannot error.
> >> + match unsafe { value.__init(ptr) } {
> >> + Ok(()) => {}
> >> + }
> >> + }
> >
> > So I think this doesn't work for a few reasons:
> >
> > - there's a lot of unsafe both here and when using this (even without
> > MaybeUninit), at least if I understand things correctly. And forcing
> > unsafe for something that inheritedly does not break any of the rust
> > memory safety rules feels silly.
> >
> > - it also looks awkward, because you have the special versions
> > write_uninit and write_uninit_slice, and imo if our write isn't good
> > enought to just work with abritrary box/container types, it's not good.
> >
> > - it doesn't actually work for one of the main uaccess.rs use-cases beyond
> > just enabling the current interface. Example in pseudo-rust:
> >
> > struct IoctlParams {
> > input: u32,
> > ouptut: u32,
> > }
> >
> > The thing is that ioctl that use the struct approach like drm does, use
> > the same struct if there's both input and output paramterers, and
> > furthermore we are not allowed to overwrite the entire struct because
> > that breaks ioctl restarting. So the flow is roughly
> >
> > let userptr : UserSlice;
> > let params : Untrusted<IoctlParams>;
> >
> > userptr.read(params));
> >
> > // validate params, do something interesting with it params.input
> >
> > // this is _not_ allowed to overwrite params.input but must leave it
> > // unchanged
> >
> > params.write(|x| { x.output = 42; });
> >
> > userptr.write(params);
> >
> > Your current write doesn't allow this case, and I think that's not good
> > enough.
>
> I see. One thing that makes this API work better is field projections
> (it's a rust language feature for which we do have a PoC macro
> implementation and which I am working on getting into the language):
> To only overwrite a part of a struct you would do
>
> params->output.write(42);
>
> The `->` syntax "projects" the `output` field, turning
> `Untrusted<IoctlParams>` into `Untrusted<u32>`.
>
> I think that this would solve your problem with the API. (we probably
> need a stop gap solution though until field projections are part of the
> language).
>
> > The one I propsed in private does:
> >
> > Untrusted<T>::write(&mut self, Fn(&mut T)->())
> >
> > It's not perfect because it allows you to do other stuff than writing,
> > but it's still pretty good by limiting any potentially dangerous code to
> > the provided closure. And with these of protection apis we need to
> > strike the right balance between as much protection as possible while
> > still allowing users to get the job done.
>
> Ideally I would like to avoid exposing the value.
>
> > Now I guess you can do this with your write too using a copy constructor
> > approach, but that will result in less efficient code and it's probably
> > also too much functional programming design for kernel developers to
> > cope with.
>
> Agreed.
>
> >> +
> >> + /// Turns a slice of untrusted values into an untrusted slice of values.
> >> + pub fn transpose_slice(slice: &[Untrusted<T>]) -> &Untrusted<[T]>
> >> + where
> >> + T: Sized,
> >> + {
> >> + let ptr = slice.as_ptr().cast::<T>();
> >> + // SAFETY: `ptr` and `len` come from the same slice reference.
> >> + let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
> >> + Untrusted::new_ref(slice)
> >> + }
> >> +
> >> + /// Turns a slice of uninitialized, untrusted values into an untrusted slice of uninitialized
> >> + /// values.
> >> + pub fn transpose_slice_uninit(
> >> + slice: &[MaybeUninit<Untrusted<T>>],
> >> + ) -> &Untrusted<[MaybeUninit<T>]>
> >
> > So these and some of the related functions to handle slice of box vs box
> > of slice feel a bit awkward. I think we can do better if we rename
> > Untrusted and Unvalidated to UntrustedBox and UnvalidatedBox, and then
> > make Untrusted and Unvalidated traits, with implementations for
> > UntrustedBox<T>, [Untrusted<T>], and all the others we want.
>
> Hmm that might work for `Untrusted`, I will see how that looks. For
> `Unvalidated` I am less confident.
>
> > I expect that in the future we'll get more boxes with special semantics
> > that we want to use together with Untrusted, not just [] and MaybeUninit.
> > One example would be when the Untrusted data is shared with userspace
> > (mmap()ed into both kernel and userspace for example for a ringbuffer), in
> > which case the data is both Untrusted but also SharedUnsafe or whatever
> > we'll call memory that's fundamentally breaking the rust memory model
> > because there's no exclusive access (userspace can always do whatever it
> > feels like), and all access has to go through at least
> > READ_ONCE/WRITE_ONCE from the C side, or often even full blown atomics.
>
> I don't think that we need to wrap that in `Untrusted`, instead the type
> representing shared data between kernel and userspace should return
> `Untrusted<u8>` instead of `u8`.
>
> > And that memory is not MaybeUninit, because we have to initialize it
> > before userspace can mmap it, for otherwise it's an information leak and
> > hence security issue.
> >
> > tldr; I think going with traits for Untrusted and Unvalidated here will be
> > worth it, even if a bit more pain at first. Plus cleaner interfaces.
> >
> >> + where
> >> + T: Sized,
> >> + {
> >> + let ptr = slice.as_ptr().cast::<MaybeUninit<T>>();
> >> + // SAFETY: `ptr` and `len` come from the same mutable slice reference.
> >> + let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
> >> + Untrusted::new_ref(slice)
> >> + }
>
> [...]
>
> >> +/// Validates untrusted data.
> >> +///
> >> +/// # Examples
> >> +///
> >> +/// The simplest way to validate data is to just implement `Validate<&Unvalidated<[u8]>>` for the
> >> +/// type that you wish to validate:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +/// error::{code::EINVAL, Error},
> >> +/// str::{CStr, CString},
> >> +/// validate::{Unvalidated, Validate},
> >> +/// };
> >> +///
> >> +/// struct Data {
> >> +/// flags: u8,
> >> +/// name: CString,
> >> +/// }
> >> +///
> >> +/// impl Validate<&Unvalidated<[u8]>> for Data {
> >> +/// type Err = Error;
> >> +///
> >> +/// fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +/// let raw = unvalidated.raw();
> >> +/// let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
> >> +/// let name = CStr::from_bytes_with_nul(name)?.to_cstring()?;
> >> +/// Ok(Data { flags, name })
> >> +/// }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// This approach copies the data and requires allocation. If you want to avoid the allocation and
> >> +/// copying the data, you can borrow from the input like this:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +/// error::{code::EINVAL, Error},
> >> +/// str::CStr,
> >> +/// validate::{Unvalidated, Validate},
> >> +/// };
> >> +///
> >> +/// struct Data<'a> {
> >> +/// flags: u8,
> >> +/// name: &'a CStr,
> >> +/// }
> >> +///
> >> +/// impl<'a> Validate<&'a Unvalidated<[u8]>> for Data<'a> {
> >> +/// type Err = Error;
> >> +///
> >> +/// fn validate(unvalidated: &'a Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +/// let raw = unvalidated.raw();
> >> +/// let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
> >> +/// let name = CStr::from_bytes_with_nul(name)?;
> >> +/// Ok(Data { flags, name })
> >> +/// }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// If you need to in-place validate your data, you currently need to resort to `unsafe`:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +/// error::{code::EINVAL, Error},
> >> +/// str::CStr,
> >> +/// validate::{Unvalidated, Validate},
> >> +/// };
> >> +/// use core::mem;
> >> +///
> >> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
> >> +/// #[repr(C)]
> >> +/// struct Data {
> >> +/// version: u8,
> >> +/// flags: u8,
> >> +/// _reserved: [u8; 2],
> >> +/// count: u64,
> >> +/// // lots of other fields...
> >> +/// }
> >> +///
> >> +/// impl Validate<&Unvalidated<[u8]>> for &Data {
> >> +/// type Err = Error;
> >> +///
> >> +/// fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +/// let raw = unvalidated.raw();
> >> +/// if raw.len() < mem::size_of::<Data>() {
> >> +/// return Err(EINVAL);
> >> +/// }
> >> +/// // can only handle version 0
> >> +/// if raw[0] != 0 {
> >> +/// return Err(EINVAL);
> >> +/// }
> >> +/// // version 0 only uses the lower 4 bits of flags
> >> +/// if raw[1] & 0xf0 != 0 {
> >> +/// return Err(EINVAL);
> >> +/// }
> >> +/// let ptr = raw.as_ptr();
> >> +/// // CAST: `Data` only contains integers and has `repr(C)`.
> >> +/// let ptr = ptr.cast::<Data>();
> >> +/// // SAFETY: `ptr` came from a reference and the cast above is valid.
> >> +/// Ok(unsafe { &*ptr })
> >> +/// }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// To be able to modify the parsed data, while still supporting zero-copy, you can implement
> >> +/// `Validate<&mut Unvalidated<[u8]>>`:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +/// error::{code::EINVAL, Error},
> >> +/// str::CStr,
> >> +/// validate::{Unvalidated, Validate},
> >> +/// };
> >> +/// use core::mem;
> >> +///
> >> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
> >> +/// #[repr(C)]
> >> +/// struct Data {
> >> +/// version: u8,
> >> +/// flags: u8,
> >> +/// _reserved: [u8; 2],
> >> +/// count: u64,
> >> +/// // lots of other fields...
> >> +/// }
> >> +///
> >> +/// impl Validate<&mut Unvalidated<[u8]>> for &Data {
> >> +/// type Err = Error;
> >
> > I think we should make that one the default, but not sure that's doable
> > with associated types instead of generics.
>
> There is the `associated_type_defaults` unstable feature:
> https://github.com/rust-lang/rust/issues/29661
>
> But I actually think that we should get away from making the `Error`
> type the return error of Rust functions. It's much better to have
> function-specific errors, since they can be more descriptive. In cases
> where we have to return to userspace or C, sure use the existing
> `Error`.
Do you mean a area-specific Error code that only returns the values that
are actually possible? I fear a bit that that could be a maintenance pain,
when the C side suddenly returns a new errno value. Or do you mean
something else.
> > The input parameter should definitely default to the output paramter,
> > because often they're just exactly the same e.g. when validating ioctl
> > input structures.
>
> I don't understand what you mean by this? The input parameter has to be
> some sort of `..Untrusted..` and the output shouldn't have that.
Yeah it needs to be wrapped, but I was wondering whether there's a way to
make Untrusted<T> default to T as the output. If I remember all my
thoughts correctly, it's been a month.
> > On this version you've also left out the in-place validation (originally
> > validate_bytes), but we can add that later on when a need arises I guess.
> > Or do you think it's just not a good idea in general?
>
> This version supports in-place validation, this example and the one
> above showcase how you would implement `Validate` in that case.
>
> You can then call `.vaildate()` on an `&[mut ]Unvalidated<[u8]>` and you
> get a `&Data` that borrows from the unvalidated data without copying the
> data.
Ah I missed that, thanks for explaining.
Cheers, Sima
>
> ---
> Cheers,
> Benno
>
> >> +///
> >> +/// fn validate(unvalidated: &mut Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +/// let raw = unvalidated.raw_mut();
> >> +/// if raw.len() < mem::size_of::<Data>() {
> >> +/// return Err(EINVAL);
> >> +/// }
> >> +/// match raw[0] {
> >> +/// 0 => {},
> >> +/// 1 => {
> >> +/// // version 1 implicitly sets the first bit.
> >> +/// raw[1] |= 1;
> >> +/// },
> >> +/// // can only handle version 0 and 1
> >> +/// _ => return Err(EINVAL),
> >> +/// }
> >> +/// // version 0 and 1 only use the lower 4 bits of flags
> >> +/// if raw[1] & 0xf0 != 0 {
> >> +/// return Err(EINVAL);
> >> +/// }
> >> +/// if raw[1] == 0 {}
> >> +/// let ptr = raw.as_ptr();
> >> +/// // CAST: `Data` only contains integers and has `repr(C)`.
> >> +/// let ptr = ptr.cast::<Data>();
> >> +/// // SAFETY: `ptr` came from a reference and the cast above is valid.
> >> +/// Ok(unsafe { &*ptr })
> >> +/// }
> >> +/// }
> >> +/// ```
> >> +pub trait Validate<I: ValidateInput>: Sized {
> >> + /// Validation error.
> >> + type Err;
> >> +
> >> + /// Validate the given untrusted data and parse it into the output type.
> >> + fn validate(unvalidated: I) -> Result<Self, Self::Err>;
> >> +}
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists