[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <254496ac-bc5f-4622-99c4-9e2ada27e069@proton.me>
Date: Mon, 30 Sep 2024 14:04:06 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, Greg KH <gregkh@...uxfoundation.org>, linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v2 1/2] rust: add untrusted data abstraction
On 26.09.24 12:40, Simona Vetter wrote:
> On Wed, Sep 25, 2024 at 08:53:05PM +0000, Benno Lossin wrote:
>> + /// Sets the underlying untrusted value.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::validate::Untrusted;
>> + ///
>> + /// let mut untrusted = Untrusted::new(42);
>> + /// untrusted.write(24);
>> + /// ```
>> + pub fn write(&mut self, value: impl Init<T>) {
>> + let ptr: *mut T = &mut self.0 .0;
>> + // SAFETY: `ptr` came from a mutable reference and the value is overwritten before it is
>> + // read.
>> + unsafe { ptr::drop_in_place(ptr) };
>> + // SAFETY: `ptr` came from a mutable reference and the initializer cannot error.
>> + match unsafe { value.__init(ptr) } {
>> + Ok(()) => {}
>> + }
>> + }
>
> So I think this doesn't work for a few reasons:
>
> - there's a lot of unsafe both here and when using this (even without
> MaybeUninit), at least if I understand things correctly. And forcing
> unsafe for something that inheritedly does not break any of the rust
> memory safety rules feels silly.
>
> - it also looks awkward, because you have the special versions
> write_uninit and write_uninit_slice, and imo if our write isn't good
> enought to just work with abritrary box/container types, it's not good.
>
> - it doesn't actually work for one of the main uaccess.rs use-cases beyond
> just enabling the current interface. Example in pseudo-rust:
>
> struct IoctlParams {
> input: u32,
> ouptut: u32,
> }
>
> The thing is that ioctl that use the struct approach like drm does, use
> the same struct if there's both input and output paramterers, and
> furthermore we are not allowed to overwrite the entire struct because
> that breaks ioctl restarting. So the flow is roughly
>
> let userptr : UserSlice;
> let params : Untrusted<IoctlParams>;
>
> userptr.read(params));
>
> // validate params, do something interesting with it params.input
>
> // this is _not_ allowed to overwrite params.input but must leave it
> // unchanged
>
> params.write(|x| { x.output = 42; });
>
> userptr.write(params);
>
> Your current write doesn't allow this case, and I think that's not good
> enough.
I see. One thing that makes this API work better is field projections
(it's a rust language feature for which we do have a PoC macro
implementation and which I am working on getting into the language):
To only overwrite a part of a struct you would do
params->output.write(42);
The `->` syntax "projects" the `output` field, turning
`Untrusted<IoctlParams>` into `Untrusted<u32>`.
I think that this would solve your problem with the API. (we probably
need a stop gap solution though until field projections are part of the
language).
> The one I propsed in private does:
>
> Untrusted<T>::write(&mut self, Fn(&mut T)->())
>
> It's not perfect because it allows you to do other stuff than writing,
> but it's still pretty good by limiting any potentially dangerous code to
> the provided closure. And with these of protection apis we need to
> strike the right balance between as much protection as possible while
> still allowing users to get the job done.
Ideally I would like to avoid exposing the value.
> Now I guess you can do this with your write too using a copy constructor
> approach, but that will result in less efficient code and it's probably
> also too much functional programming design for kernel developers to
> cope with.
Agreed.
>> +
>> + /// Turns a slice of untrusted values into an untrusted slice of values.
>> + pub fn transpose_slice(slice: &[Untrusted<T>]) -> &Untrusted<[T]>
>> + where
>> + T: Sized,
>> + {
>> + let ptr = slice.as_ptr().cast::<T>();
>> + // SAFETY: `ptr` and `len` come from the same slice reference.
>> + let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
>> + Untrusted::new_ref(slice)
>> + }
>> +
>> + /// Turns a slice of uninitialized, untrusted values into an untrusted slice of uninitialized
>> + /// values.
>> + pub fn transpose_slice_uninit(
>> + slice: &[MaybeUninit<Untrusted<T>>],
>> + ) -> &Untrusted<[MaybeUninit<T>]>
>
> So these and some of the related functions to handle slice of box vs box
> of slice feel a bit awkward. I think we can do better if we rename
> Untrusted and Unvalidated to UntrustedBox and UnvalidatedBox, and then
> make Untrusted and Unvalidated traits, with implementations for
> UntrustedBox<T>, [Untrusted<T>], and all the others we want.
Hmm that might work for `Untrusted`, I will see how that looks. For
`Unvalidated` I am less confident.
> I expect that in the future we'll get more boxes with special semantics
> that we want to use together with Untrusted, not just [] and MaybeUninit.
> One example would be when the Untrusted data is shared with userspace
> (mmap()ed into both kernel and userspace for example for a ringbuffer), in
> which case the data is both Untrusted but also SharedUnsafe or whatever
> we'll call memory that's fundamentally breaking the rust memory model
> because there's no exclusive access (userspace can always do whatever it
> feels like), and all access has to go through at least
> READ_ONCE/WRITE_ONCE from the C side, or often even full blown atomics.
I don't think that we need to wrap that in `Untrusted`, instead the type
representing shared data between kernel and userspace should return
`Untrusted<u8>` instead of `u8`.
> And that memory is not MaybeUninit, because we have to initialize it
> before userspace can mmap it, for otherwise it's an information leak and
> hence security issue.
>
> tldr; I think going with traits for Untrusted and Unvalidated here will be
> worth it, even if a bit more pain at first. Plus cleaner interfaces.
>
>> + where
>> + T: Sized,
>> + {
>> + let ptr = slice.as_ptr().cast::<MaybeUninit<T>>();
>> + // SAFETY: `ptr` and `len` come from the same mutable slice reference.
>> + let slice = unsafe { slice::from_raw_parts(ptr, slice.len()) };
>> + Untrusted::new_ref(slice)
>> + }
[...]
>> +/// Validates untrusted data.
>> +///
>> +/// # Examples
>> +///
>> +/// The simplest way to validate data is to just implement `Validate<&Unvalidated<[u8]>>` for the
>> +/// type that you wish to validate:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +/// error::{code::EINVAL, Error},
>> +/// str::{CStr, CString},
>> +/// validate::{Unvalidated, Validate},
>> +/// };
>> +///
>> +/// struct Data {
>> +/// flags: u8,
>> +/// name: CString,
>> +/// }
>> +///
>> +/// impl Validate<&Unvalidated<[u8]>> for Data {
>> +/// type Err = Error;
>> +///
>> +/// fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +/// let raw = unvalidated.raw();
>> +/// let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
>> +/// let name = CStr::from_bytes_with_nul(name)?.to_cstring()?;
>> +/// Ok(Data { flags, name })
>> +/// }
>> +/// }
>> +/// ```
>> +///
>> +/// This approach copies the data and requires allocation. If you want to avoid the allocation and
>> +/// copying the data, you can borrow from the input like this:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +/// error::{code::EINVAL, Error},
>> +/// str::CStr,
>> +/// validate::{Unvalidated, Validate},
>> +/// };
>> +///
>> +/// struct Data<'a> {
>> +/// flags: u8,
>> +/// name: &'a CStr,
>> +/// }
>> +///
>> +/// impl<'a> Validate<&'a Unvalidated<[u8]>> for Data<'a> {
>> +/// type Err = Error;
>> +///
>> +/// fn validate(unvalidated: &'a Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +/// let raw = unvalidated.raw();
>> +/// let (&flags, name) = raw.split_first().ok_or(EINVAL)?;
>> +/// let name = CStr::from_bytes_with_nul(name)?;
>> +/// Ok(Data { flags, name })
>> +/// }
>> +/// }
>> +/// ```
>> +///
>> +/// If you need to in-place validate your data, you currently need to resort to `unsafe`:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +/// error::{code::EINVAL, Error},
>> +/// str::CStr,
>> +/// validate::{Unvalidated, Validate},
>> +/// };
>> +/// use core::mem;
>> +///
>> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
>> +/// #[repr(C)]
>> +/// struct Data {
>> +/// version: u8,
>> +/// flags: u8,
>> +/// _reserved: [u8; 2],
>> +/// count: u64,
>> +/// // lots of other fields...
>> +/// }
>> +///
>> +/// impl Validate<&Unvalidated<[u8]>> for &Data {
>> +/// type Err = Error;
>> +///
>> +/// fn validate(unvalidated: &Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +/// let raw = unvalidated.raw();
>> +/// if raw.len() < mem::size_of::<Data>() {
>> +/// return Err(EINVAL);
>> +/// }
>> +/// // can only handle version 0
>> +/// if raw[0] != 0 {
>> +/// return Err(EINVAL);
>> +/// }
>> +/// // version 0 only uses the lower 4 bits of flags
>> +/// if raw[1] & 0xf0 != 0 {
>> +/// return Err(EINVAL);
>> +/// }
>> +/// let ptr = raw.as_ptr();
>> +/// // CAST: `Data` only contains integers and has `repr(C)`.
>> +/// let ptr = ptr.cast::<Data>();
>> +/// // SAFETY: `ptr` came from a reference and the cast above is valid.
>> +/// Ok(unsafe { &*ptr })
>> +/// }
>> +/// }
>> +/// ```
>> +///
>> +/// To be able to modify the parsed data, while still supporting zero-copy, you can implement
>> +/// `Validate<&mut Unvalidated<[u8]>>`:
>> +///
>> +/// ```
>> +/// use kernel::{
>> +/// error::{code::EINVAL, Error},
>> +/// str::CStr,
>> +/// validate::{Unvalidated, Validate},
>> +/// };
>> +/// use core::mem;
>> +///
>> +/// // Important: use `repr(C)`, this ensures a linear layout of this type.
>> +/// #[repr(C)]
>> +/// struct Data {
>> +/// version: u8,
>> +/// flags: u8,
>> +/// _reserved: [u8; 2],
>> +/// count: u64,
>> +/// // lots of other fields...
>> +/// }
>> +///
>> +/// impl Validate<&mut Unvalidated<[u8]>> for &Data {
>> +/// type Err = Error;
>
> I think we should make that one the default, but not sure that's doable
> with associated types instead of generics.
There is the `associated_type_defaults` unstable feature:
https://github.com/rust-lang/rust/issues/29661
But I actually think that we should get away from making the `Error`
type the return error of Rust functions. It's much better to have
function-specific errors, since they can be more descriptive. In cases
where we have to return to userspace or C, sure use the existing
`Error`.
> The input parameter should definitely default to the output paramter,
> because often they're just exactly the same e.g. when validating ioctl
> input structures.
I don't understand what you mean by this? The input parameter has to be
some sort of `..Untrusted..` and the output shouldn't have that.
> On this version you've also left out the in-place validation (originally
> validate_bytes), but we can add that later on when a need arises I guess.
> Or do you think it's just not a good idea in general?
This version supports in-place validation, this example and the one
above showcase how you would implement `Validate` in that case.
You can then call `.vaildate()` on an `&[mut ]Unvalidated<[u8]>` and you
get a `&Data` that borrows from the unvalidated data without copying the
data.
---
Cheers,
Benno
>> +///
>> +/// fn validate(unvalidated: &mut Unvalidated<[u8]>) -> Result<Self, Self::Err> {
>> +/// let raw = unvalidated.raw_mut();
>> +/// if raw.len() < mem::size_of::<Data>() {
>> +/// return Err(EINVAL);
>> +/// }
>> +/// match raw[0] {
>> +/// 0 => {},
>> +/// 1 => {
>> +/// // version 1 implicitly sets the first bit.
>> +/// raw[1] |= 1;
>> +/// },
>> +/// // can only handle version 0 and 1
>> +/// _ => return Err(EINVAL),
>> +/// }
>> +/// // version 0 and 1 only use the lower 4 bits of flags
>> +/// if raw[1] & 0xf0 != 0 {
>> +/// return Err(EINVAL);
>> +/// }
>> +/// if raw[1] == 0 {}
>> +/// let ptr = raw.as_ptr();
>> +/// // CAST: `Data` only contains integers and has `repr(C)`.
>> +/// let ptr = ptr.cast::<Data>();
>> +/// // SAFETY: `ptr` came from a reference and the cast above is valid.
>> +/// Ok(unsafe { &*ptr })
>> +/// }
>> +/// }
>> +/// ```
>> +pub trait Validate<I: ValidateInput>: Sized {
>> + /// Validation error.
>> + type Err;
>> +
>> + /// Validate the given untrusted data and parse it into the output type.
>> + fn validate(unvalidated: I) -> Result<Self, Self::Err>;
>> +}
Powered by blists - more mailing lists