linux-kernel - Re: [PATCH v2 1/2] rust: add untrusted data abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <254496ac-bc5f-4622-99c4-9e2ada27e069@proton.me>
Date: Mon, 30 Sep 2024 14:04:06 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, Greg KH <gregkh@...uxfoundation.org>, linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v2 1/2] rust: add untrusted data abstraction

On 26.09.24 12:40, Simona Vetter wrote:
> On Wed, Sep 25, 2024 at 08:53:05PM +0000, Benno Lossin wrote:
>> +    /// Sets the underlying untrusted value.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// use kernel::validate::Untrusted;
>> +    ///
>> +    /// let mut untrusted = Untrusted::new(42);
>> +    /// untrusted.write(24);
>> +    /// ```
>> +    pub fn write(&mut self, value: impl Init<T>) {
>> +        let ptr: *mut T = &mut self.0 .0;
>> +        // SAFETY: `ptr` came from a mutable reference and the value is overwritten before it is
>> +        // read.
>> +        unsafe { ptr::drop_in_place(ptr) };
>> +        // SAFETY: `ptr` came from a mutable reference and the initializer cannot error.
>> +        match unsafe { value.__init(ptr) } {
>> +            Ok(()) => {}
>> +        }
>> +    }
> 
> So I think this doesn't work for a few reasons:
> 
> - there's a lot of unsafe both here and when using this (even without
>   MaybeUninit), at least if I understand things correctly. And forcing
>   unsafe for something that inheritedly does not break any of the rust
>   memory safety rules feels silly.
> 
> - it also looks awkward, because you have the special versions
>   write_uninit and write_uninit_slice, and imo if our write isn't good
>   enought to just work with abritrary box/container types, it's not good.
> 
> - it doesn't actually work for one of the main uaccess.rs use-cases beyond
>   just enabling the current interface. Example in pseudo-rust:
> 
>   struct IoctlParams {
>       input: u32,
>       ouptut: u32,
>   }
> 
>   The thing is that ioctl that use the struct approach like drm does, use
>   the same struct if there's both input and output paramterers, and
>   furthermore we are not allowed to overwrite the entire struct because
>   that breaks ioctl restarting. So the flow is roughly
> 
>   let userptr : UserSlice;
>   let params : Untrusted<IoctlParams>;
> 
>   userptr.read(params));
> 
>   // validate params, do something interesting with it params.input
> 
>   // this is _not_ allowed to overwrite params.input but must leave it
>   // unchanged
> 
>   params.write(|x| { x.output = 42; });
> 
>   userptr.write(params);
> 
>   Your current write doesn't allow this case, and I think that's not good
>   enough.

I see. One thing that makes this API work better is field projections
(it's a rust language feature for which we do have a PoC macro
implementation and which I am working on getting into the language):
To only overwrite a part of a struct you would do

    params->output.write(42);

The `->` syntax "projects" the `output` field, turning
`Untrusted<IoctlParams>` into `Untrusted<u32>`.

I think that this would solve your problem with the API. (we probably
need a stop gap solution though until field projections are part of the
language).

>   The one I propsed in private does:
> 
>   Untrusted<T>::write(&mut self, Fn(&mut T)->())
> 
>   It's not perfect because it allows you to do other stuff than writing,
>   but it's still pretty good by limiting any potentially dangerous code to
>   the provided closure. And with these of protection apis we need to
>   strike the right balance between as much protection as possible while
>   still allowing users to get the job done.

Ideally I would like to avoid exposing the value.

>   Now I guess you can do this with your write too using a copy constructor
>   approach, but that will result in less efficient code and it's probably
>   also too much functional programming design for kernel developers to
>   cope with.

Agreed.

>> +
>> +    /// Turns a slice of untrusted values into an untrusted slice of values.
>> +    pub fn transpose_slice(slice: &[Untrusted<T>]) -> &Untrusted<[T]>
>> +    where
>> +        T: Sized,
>> +    {
>> +        let ptr = slice.as_ptr().cast::<T>();
>> +        // SAFETY: `ptr` and `len` come from the same slice reference.
>> +        let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
>> +        Untrusted::new_ref(slice)
>> +    }
>> +
>> +    /// Turns a slice of uninitialized, untrusted values into an untrusted slice of uninitialized
>> +    /// values.
>> +    pub fn transpose_slice_uninit(
>> +        slice: &[MaybeUninit<Untrusted<T>>],
>> +    ) -> &Untrusted<[MaybeUninit<T>]>
> 
> So these and some of the related functions to handle slice of box vs box
> of slice feel a bit awkward. I think we can do better if we rename
> Untrusted and Unvalidated to UntrustedBox and UnvalidatedBox, and then
> make Untrusted and Unvalidated traits, with implementations for
> UntrustedBox<T>, [Untrusted<T>], and all the others we want.

Hmm that might work for `Untrusted`, I will see how that looks. For
`Unvalidated` I am less confident.

> I expect that in the future we'll get more boxes with special semantics
> that we want to use together with Untrusted, not just [] and MaybeUninit.
> One example would be when the Untrusted data is shared with userspace
> (mmap()ed into both kernel and userspace for example for a ringbuffer), in
> which case the data is both Untrusted but also SharedUnsafe or whatever
> we'll call memory that's fundamentally breaking the rust memory model
> because there's no exclusive access (userspace can always do whatever it
> feels like), and all access has to go through at least
> READ_ONCE/WRITE_ONCE from the C side, or often even full blown atomics.

I don't think that we need to wrap that in `Untrusted`, instead the type
representing shared data between kernel and userspace should return
`Untrusted<u8>` instead of `u8`.

> And that memory is not MaybeUninit, because we have to initialize it
> before userspace can mmap it, for otherwise it's an information leak and
> hence security issue.
> 
> tldr; I think going with traits for Untrusted and Unvalidated here will be
> worth it, even if a bit more pain at first. Plus cleaner interfaces.
> 
>> +    where
>> +        T: Sized,
>> +    {
>> +        let ptr = slice.as_ptr().cast::<MaybeUninit<T>>();
>> +        // SAFETY: `ptr` and `len` come from the same mutable slice reference.
>> +        let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
>> +        Untrusted::new_ref(slice)
>> +    }

[...]

>> +/// Validates untrusted data.
>> +///
>> +/// # Examples
>> +///
>> +/// The simplest way to validate data is to just implement `Validate<&Unvalidated<[u8]>>` for the
>> +/// type that you wish to validate:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +///     error::{code::EINVAL, Error},
>> +///     str::{CStr, CString},
>> +///     validate::{Unvalidated, Validate},
>> +/// };
>> +///
>> +/// struct Data {
>> +///     flags: u8,
>> +///     name: CString,
>> +/// }
>> +///
>> +/// impl Validate<&Unvalidated<[u8]>> for Data {
>> +///     type Err = Error;
>> +///
>> +///     fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +///         let raw = unvalidated.raw();
>> +///         let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
>> +///         let name = CStr::from_bytes_with_nul(name)?.to_cstring()?;
>> +///         Ok(Data { flags, name })
>> +///     }
>> +/// }
>> +/// ```
>> +///
>> +/// This approach copies the data and requires allocation. If you want to avoid the allocation and
>> +/// copying the data, you can borrow from the input like this:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +///     error::{code::EINVAL, Error},
>> +///     str::CStr,
>> +///     validate::{Unvalidated, Validate},
>> +/// };
>> +///
>> +/// struct Data<'a> {
>> +///     flags: u8,
>> +///     name: &'a CStr,
>> +/// }
>> +///
>> +/// impl<'a> Validate<&'a Unvalidated<[u8]>> for Data<'a> {
>> +///     type Err = Error;
>> +///
>> +///     fn validate(unvalidated: &'a Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +///         let raw = unvalidated.raw();
>> +///         let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
>> +///         let name = CStr::from_bytes_with_nul(name)?;
>> +///         Ok(Data { flags, name })
>> +///     }
>> +/// }
>> +/// ```
>> +///
>> +/// If you need to in-place validate your data, you currently need to resort to `unsafe`:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +///     error::{code::EINVAL, Error},
>> +///     str::CStr,
>> +///     validate::{Unvalidated, Validate},
>> +/// };
>> +/// use core::mem;
>> +///
>> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
>> +/// #[repr(C)]
>> +/// struct Data {
>> +///     version: u8,
>> +///     flags: u8,
>> +///     _reserved: [u8; 2],
>> +///     count: u64,
>> +///     // lots of other fields...
>> +/// }
>> +///
>> +/// impl Validate<&Unvalidated<[u8]>> for &Data {
>> +///     type Err = Error;
>> +///
>> +///     fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +///         let raw = unvalidated.raw();
>> +///         if raw.len() < mem::size_of::<Data>() {
>> +///             return Err(EINVAL);
>> +///         }
>> +///         // can only handle version 0
>> +///         if raw[0] != 0 {
>> +///             return Err(EINVAL);
>> +///         }
>> +///         // version 0 only uses the lower 4 bits of flags
>> +///         if raw[1] & 0xf0 != 0 {
>> +///             return Err(EINVAL);
>> +///         }
>> +///         let ptr = raw.as_ptr();
>> +///         // CAST: `Data` only contains integers and has `repr(C)`.
>> +///         let ptr = ptr.cast::<Data>();
>> +///         // SAFETY: `ptr` came from a reference and the cast above is valid.
>> +///         Ok(unsafe { &*ptr })
>> +///     }
>> +/// }
>> +/// ```
>> +///
>> +/// To be able to modify the parsed data, while still supporting zero-copy, you can implement
>> +/// `Validate<&mut Unvalidated<[u8]>>`:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +///     error::{code::EINVAL, Error},
>> +///     str::CStr,
>> +///     validate::{Unvalidated, Validate},
>> +/// };
>> +/// use core::mem;
>> +///
>> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
>> +/// #[repr(C)]
>> +/// struct Data {
>> +///     version: u8,
>> +///     flags: u8,
>> +///     _reserved: [u8; 2],
>> +///     count: u64,
>> +///     // lots of other fields...
>> +/// }
>> +///
>> +/// impl Validate<&mut Unvalidated<[u8]>> for &Data {
>> +///     type Err = Error;
> 
> I think we should make that one the default, but not sure that's doable
> with associated types instead of generics.

There is the `associated_type_defaults` unstable feature: 
https://github.com/rust-lang/rust/issues/29661

But I actually think that we should get away from making the `Error`
type the return error of Rust functions. It's much better to have
function-specific errors, since they can be more descriptive. In cases
where we have to return to userspace or C, sure use the existing
`Error`.

> The input parameter should definitely default to the output paramter,
> because often they're just exactly the same e.g. when validating ioctl
> input structures.

I don't understand what you mean by this? The input parameter has to be
some sort of `..Untrusted..` and the output shouldn't have that.

> On this version you've also left out the in-place validation (originally
> validate_bytes), but we can add that later on when a need arises I guess.
> Or do you think it's just not a good idea in general?

This version supports in-place validation, this example and the one
above showcase how you would implement `Validate` in that case.

You can then call `.vaildate()` on an `&[mut ]Unvalidated<[u8]>` and you
get a `&Data` that borrows from the unvalidated data without copying the
data.

---
Cheers,
Benno

>> +///
>> +///     fn validate(unvalidated: &mut Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +///         let raw = unvalidated.raw_mut();
>> +///         if raw.len() < mem::size_of::<Data>() {
>> +///             return Err(EINVAL);
>> +///         }
>> +///         match raw[0] {
>> +///             0 => {},
>> +///             1 => {
>> +///                 // version 1 implicitly sets the first bit.
>> +///                 raw[1] |= 1;
>> +///             },
>> +///             // can only handle version 0 and 1
>> +///             _ => return Err(EINVAL),
>> +///         }
>> +///         // version 0 and 1 only use the lower 4 bits of flags
>> +///         if raw[1] & 0xf0 != 0 {
>> +///             return Err(EINVAL);
>> +///         }
>> +///         if raw[1] == 0 {}
>> +///         let ptr = raw.as_ptr();
>> +///         // CAST: `Data` only contains integers and has `repr(C)`.
>> +///         let ptr = ptr.cast::<Data>();
>> +///         // SAFETY: `ptr` came from a reference and the cast above is valid.
>> +///         Ok(unsafe { &*ptr })
>> +///     }
>> +/// }
>> +/// ```
>> +pub trait Validate<I: ValidateInput>: Sized {
>> +    /// Validation error.
>> +    type Err;
>> +
>> +    /// Validate the given untrusted data and parse it into the output type.
>> +    fn validate(unvalidated: I) -> Result<Self, Self::Err>;
>> +}