linux-kernel - Re: [PATCH v2 1/2] rust: add untrusted data abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z0WBXhyyE4ilLlCE@phenom.ffwll.local>
Date: Tue, 26 Nov 2024 09:05:50 +0100
From: Simona Vetter <simona.vetter@...ll.ch>
To: Benno Lossin <benno.lossin@...ton.me>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	Greg KH <gregkh@...uxfoundation.org>, linux-kernel@...r.kernel.org,
	rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v2 1/2] rust: add untrusted data abstraction

Sorry for the super late reply, was absorbed too much with life stuff :-/

On Mon, Sep 30, 2024 at 02:04:06PM +0000, Benno Lossin wrote:
> On 26.09.24 12:40, Simona Vetter wrote:
> > On Wed, Sep 25, 2024 at 08:53:05PM +0000, Benno Lossin wrote:
> >> +    /// Sets the underlying untrusted value.
> >> +    ///
> >> +    /// # Examples
> >> +    ///
> >> +    /// ```
> >> +    /// use kernel::validate::Untrusted;
> >> +    ///
> >> +    /// let mut untrusted = Untrusted::new(42);
> >> +    /// untrusted.write(24);
> >> +    /// ```
> >> +    pub fn write(&mut self, value: impl Init<T>) {
> >> +        let ptr: *mut T = &mut self.0 .0;
> >> +        // SAFETY: `ptr` came from a mutable reference and the value is overwritten before it is
> >> +        // read.
> >> +        unsafe { ptr::drop_in_place(ptr) };
> >> +        // SAFETY: `ptr` came from a mutable reference and the initializer cannot error.
> >> +        match unsafe { value.__init(ptr) } {
> >> +            Ok(()) => {}
> >> +        }
> >> +    }
> > 
> > So I think this doesn't work for a few reasons:
> > 
> > - there's a lot of unsafe both here and when using this (even without
> >   MaybeUninit), at least if I understand things correctly. And forcing
> >   unsafe for something that inheritedly does not break any of the rust
> >   memory safety rules feels silly.
> > 
> > - it also looks awkward, because you have the special versions
> >   write_uninit and write_uninit_slice, and imo if our write isn't good
> >   enought to just work with abritrary box/container types, it's not good.
> > 
> > - it doesn't actually work for one of the main uaccess.rs use-cases beyond
> >   just enabling the current interface. Example in pseudo-rust:
> > 
> >   struct IoctlParams {
> >       input: u32,
> >       ouptut: u32,
> >   }
> > 
> >   The thing is that ioctl that use the struct approach like drm does, use
> >   the same struct if there's both input and output paramterers, and
> >   furthermore we are not allowed to overwrite the entire struct because
> >   that breaks ioctl restarting. So the flow is roughly
> > 
> >   let userptr : UserSlice;
> >   let params : Untrusted<IoctlParams>;
> > 
> >   userptr.read(params));
> > 
> >   // validate params, do something interesting with it params.input
> > 
> >   // this is _not_ allowed to overwrite params.input but must leave it
> >   // unchanged
> > 
> >   params.write(|x| { x.output = 42; });
> > 
> >   userptr.write(params);
> > 
> >   Your current write doesn't allow this case, and I think that's not good
> >   enough.
> 
> I see. One thing that makes this API work better is field projections
> (it's a rust language feature for which we do have a PoC macro
> implementation and which I am working on getting into the language):
> To only overwrite a part of a struct you would do
> 
>     params->output.write(42);
> 
> The `->` syntax "projects" the `output` field, turning
> `Untrusted<IoctlParams>` into `Untrusted<u32>`.
> 
> I think that this would solve your problem with the API. (we probably
> need a stop gap solution though until field projections are part of the
> language).
> 
> >   The one I propsed in private does:
> > 
> >   Untrusted<T>::write(&mut self, Fn(&mut T)->())
> > 
> >   It's not perfect because it allows you to do other stuff than writing,
> >   but it's still pretty good by limiting any potentially dangerous code to
> >   the provided closure. And with these of protection apis we need to
> >   strike the right balance between as much protection as possible while
> >   still allowing users to get the job done.
> 
> Ideally I would like to avoid exposing the value.
> 
> >   Now I guess you can do this with your write too using a copy constructor
> >   approach, but that will result in less efficient code and it's probably
> >   also too much functional programming design for kernel developers to
> >   cope with.
> 
> Agreed.
> 
> >> +
> >> +    /// Turns a slice of untrusted values into an untrusted slice of values.
> >> +    pub fn transpose_slice(slice: &[Untrusted<T>]) -> &Untrusted<[T]>
> >> +    where
> >> +        T: Sized,
> >> +    {
> >> +        let ptr = slice.as_ptr().cast::<T>();
> >> +        // SAFETY: `ptr` and `len` come from the same slice reference.
> >> +        let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
> >> +        Untrusted::new_ref(slice)
> >> +    }
> >> +
> >> +    /// Turns a slice of uninitialized, untrusted values into an untrusted slice of uninitialized
> >> +    /// values.
> >> +    pub fn transpose_slice_uninit(
> >> +        slice: &[MaybeUninit<Untrusted<T>>],
> >> +    ) -> &Untrusted<[MaybeUninit<T>]>
> > 
> > So these and some of the related functions to handle slice of box vs box
> > of slice feel a bit awkward. I think we can do better if we rename
> > Untrusted and Unvalidated to UntrustedBox and UnvalidatedBox, and then
> > make Untrusted and Unvalidated traits, with implementations for
> > UntrustedBox<T>, [Untrusted<T>], and all the others we want.
> 
> Hmm that might work for `Untrusted`, I will see how that looks. For
> `Unvalidated` I am less confident.
> 
> > I expect that in the future we'll get more boxes with special semantics
> > that we want to use together with Untrusted, not just [] and MaybeUninit.
> > One example would be when the Untrusted data is shared with userspace
> > (mmap()ed into both kernel and userspace for example for a ringbuffer), in
> > which case the data is both Untrusted but also SharedUnsafe or whatever
> > we'll call memory that's fundamentally breaking the rust memory model
> > because there's no exclusive access (userspace can always do whatever it
> > feels like), and all access has to go through at least
> > READ_ONCE/WRITE_ONCE from the C side, or often even full blown atomics.
> 
> I don't think that we need to wrap that in `Untrusted`, instead the type
> representing shared data between kernel and userspace should return
> `Untrusted<u8>` instead of `u8`.
> 
> > And that memory is not MaybeUninit, because we have to initialize it
> > before userspace can mmap it, for otherwise it's an information leak and
> > hence security issue.
> > 
> > tldr; I think going with traits for Untrusted and Unvalidated here will be
> > worth it, even if a bit more pain at first. Plus cleaner interfaces.
> > 
> >> +    where
> >> +        T: Sized,
> >> +    {
> >> +        let ptr = slice.as_ptr().cast::<MaybeUninit<T>>();
> >> +        // SAFETY: `ptr` and `len` come from the same mutable slice reference.
> >> +        let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
> >> +        Untrusted::new_ref(slice)
> >> +    }
> 
> [...]
> 
> >> +/// Validates untrusted data.
> >> +///
> >> +/// # Examples
> >> +///
> >> +/// The simplest way to validate data is to just implement `Validate<&Unvalidated<[u8]>>` for the
> >> +/// type that you wish to validate:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +///     error::{code::EINVAL, Error},
> >> +///     str::{CStr, CString},
> >> +///     validate::{Unvalidated, Validate},
> >> +/// };
> >> +///
> >> +/// struct Data {
> >> +///     flags: u8,
> >> +///     name: CString,
> >> +/// }
> >> +///
> >> +/// impl Validate<&Unvalidated<[u8]>> for Data {
> >> +///     type Err = Error;
> >> +///
> >> +///     fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +///         let raw = unvalidated.raw();
> >> +///         let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
> >> +///         let name = CStr::from_bytes_with_nul(name)?.to_cstring()?;
> >> +///         Ok(Data { flags, name })
> >> +///     }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// This approach copies the data and requires allocation. If you want to avoid the allocation and
> >> +/// copying the data, you can borrow from the input like this:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +///     error::{code::EINVAL, Error},
> >> +///     str::CStr,
> >> +///     validate::{Unvalidated, Validate},
> >> +/// };
> >> +///
> >> +/// struct Data<'a> {
> >> +///     flags: u8,
> >> +///     name: &'a CStr,
> >> +/// }
> >> +///
> >> +/// impl<'a> Validate<&'a Unvalidated<[u8]>> for Data<'a> {
> >> +///     type Err = Error;
> >> +///
> >> +///     fn validate(unvalidated: &'a Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +///         let raw = unvalidated.raw();
> >> +///         let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
> >> +///         let name = CStr::from_bytes_with_nul(name)?;
> >> +///         Ok(Data { flags, name })
> >> +///     }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// If you need to in-place validate your data, you currently need to resort to `unsafe`:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +///     error::{code::EINVAL, Error},
> >> +///     str::CStr,
> >> +///     validate::{Unvalidated, Validate},
> >> +/// };
> >> +/// use core::mem;
> >> +///
> >> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
> >> +/// #[repr(C)]
> >> +/// struct Data {
> >> +///     version: u8,
> >> +///     flags: u8,
> >> +///     _reserved: [u8; 2],
> >> +///     count: u64,
> >> +///     // lots of other fields...
> >> +/// }
> >> +///
> >> +/// impl Validate<&Unvalidated<[u8]>> for &Data {
> >> +///     type Err = Error;
> >> +///
> >> +///     fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +///         let raw = unvalidated.raw();
> >> +///         if raw.len() < mem::size_of::<Data>() {
> >> +///             return Err(EINVAL);
> >> +///         }
> >> +///         // can only handle version 0
> >> +///         if raw[0] != 0 {
> >> +///             return Err(EINVAL);
> >> +///         }
> >> +///         // version 0 only uses the lower 4 bits of flags
> >> +///         if raw[1] & 0xf0 != 0 {
> >> +///             return Err(EINVAL);
> >> +///         }
> >> +///         let ptr = raw.as_ptr();
> >> +///         // CAST: `Data` only contains integers and has `repr(C)`.
> >> +///         let ptr = ptr.cast::<Data>();
> >> +///         // SAFETY: `ptr` came from a reference and the cast above is valid.
> >> +///         Ok(unsafe { &*ptr })
> >> +///     }
> >> +/// }
> >> +/// ```
> >> +///
> >> +/// To be able to modify the parsed data, while still supporting zero-copy, you can implement
> >> +/// `Validate<&mut Unvalidated<[u8]>>`:
> >> +///
> >> +/// ```
> >> +/// use kernel::{
> >> +///     error::{code::EINVAL, Error},
> >> +///     str::CStr,
> >> +///     validate::{Unvalidated, Validate},
> >> +/// };
> >> +/// use core::mem;
> >> +///
> >> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
> >> +/// #[repr(C)]
> >> +/// struct Data {
> >> +///     version: u8,
> >> +///     flags: u8,
> >> +///     _reserved: [u8; 2],
> >> +///     count: u64,
> >> +///     // lots of other fields...
> >> +/// }
> >> +///
> >> +/// impl Validate<&mut Unvalidated<[u8]>> for &Data {
> >> +///     type Err = Error;
> > 
> > I think we should make that one the default, but not sure that's doable
> > with associated types instead of generics.
> 
> There is the `associated_type_defaults` unstable feature: 
> https://github.com/rust-lang/rust/issues/29661
> 
> But I actually think that we should get away from making the `Error`
> type the return error of Rust functions. It's much better to have
> function-specific errors, since they can be more descriptive. In cases
> where we have to return to userspace or C, sure use the existing
> `Error`.

Do you mean a area-specific Error code that only returns the values that
are actually possible? I fear a bit that that could be a maintenance pain,
when the C side suddenly returns a new errno value. Or do you mean
something else.

> > The input parameter should definitely default to the output paramter,
> > because often they're just exactly the same e.g. when validating ioctl
> > input structures.
> 
> I don't understand what you mean by this? The input parameter has to be
> some sort of `..Untrusted..` and the output shouldn't have that.

Yeah it needs to be wrapped, but I was wondering whether there's a way to
make Untrusted<T> default to T as the output. If I remember all my
thoughts correctly, it's been a month.

> > On this version you've also left out the in-place validation (originally
> > validate_bytes), but we can add that later on when a need arises I guess.
> > Or do you think it's just not a good idea in general?
> 
> This version supports in-place validation, this example and the one
> above showcase how you would implement `Validate` in that case.
> 
> You can then call `.vaildate()` on an `&[mut ]Unvalidated<[u8]>` and you
> get a `&Data` that borrows from the unvalidated data without copying the
> data.

Ah I missed that, thanks for explaining.

Cheers, Sima

> 
> ---
> Cheers,
> Benno
> 
> >> +///
> >> +///     fn validate(unvalidated: &mut Unvalidated<[u8]>) -> Result<Self, Self::Err> {
> >> +///         let raw = unvalidated.raw_mut();
> >> +///         if raw.len() < mem::size_of::<Data>() {
> >> +///             return Err(EINVAL);
> >> +///         }
> >> +///         match raw[0] {
> >> +///             0 => {},
> >> +///             1 => {
> >> +///                 // version 1 implicitly sets the first bit.
> >> +///                 raw[1] |= 1;
> >> +///             },
> >> +///             // can only handle version 0 and 1
> >> +///             _ => return Err(EINVAL),
> >> +///         }
> >> +///         // version 0 and 1 only use the lower 4 bits of flags
> >> +///         if raw[1] & 0xf0 != 0 {
> >> +///             return Err(EINVAL);
> >> +///         }
> >> +///         if raw[1] == 0 {}
> >> +///         let ptr = raw.as_ptr();
> >> +///         // CAST: `Data` only contains integers and has `repr(C)`.
> >> +///         let ptr = ptr.cast::<Data>();
> >> +///         // SAFETY: `ptr` came from a reference and the cast above is valid.
> >> +///         Ok(unsafe { &*ptr })
> >> +///     }
> >> +/// }
> >> +/// ```
> >> +pub trait Validate<I: ValidateInput>: Sized {
> >> +    /// Validation error.
> >> +    type Err;
> >> +
> >> +    /// Validate the given untrusted data and parse it into the output type.
> >> +    fn validate(unvalidated: I) -> Result<Self, Self::Err>;
> >> +}
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch