linux-kernel - Re: [PATCH 1/3] rust: add untrusted data abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3b44a97-83db-4bea-b89e-9d631c98ed1e@proton.me>
Date: Mon, 23 Sep 2024 16:56:08 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Greg KH <gregkh@...uxfoundation.org>, Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Wedson Almeida Filho <wedsonaf@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Andreas Hindborg <a.hindborg@...sung.com>, Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] rust: add untrusted data abstraction

On 23.09.24 18:08, Simona Vetter wrote:
> On Sat, Sep 21, 2024 at 07:45:44AM +0000, Benno Lossin wrote:
>> On 16.09.24 17:49, Simona Vetter wrote:
>>> On Fri, Sep 13, 2024 at 04:49:29PM +0000, Benno Lossin wrote:
>>>> On 13.09.24 17:33, Simona Vetter wrote:
>>>>> On Fri, Sep 13, 2024 at 11:26:57AM +0000, Benno Lossin wrote:
>>>>>>  /// Used to transfer ownership to and from foreign (non-Rust) languages.
>>>>>> @@ -532,3 +533,248 @@ unsafe impl AsBytes for str {}
>>>>>>  // does not have any uninitialized portions either.
>>>>>>  unsafe impl<T: AsBytes> AsBytes for [T] {}
>>>>>>  unsafe impl<T: AsBytes, const N: usize> AsBytes for [T; N] {}
>>>>>> +
>>>>>> +/// Untrusted data of type `T`.
>>>>>> +///
>>>>>> +/// When reading data from userspace, hardware or other external untrusted sources, the data must
>>>>>> +/// be validated before it is used for logic within the kernel. To do so, the [`validate()`]
>>>>>> +/// function exists and uses the [`Validator`] trait. For raw bytes validation there also is the
>>>>>> +/// [`validate_bytes()`] function.
>>>>>> +///
>>>>>> +///
>>>>>> +/// [`validate()`]: Self::validate
>>>>>> +/// [`validate_bytes()`]: Self::validate_bytes
>>>>>> +#[repr(transparent)]
>>>>>> +pub struct Untrusted<T: ?Sized>(T);
>>>>>
>>>>> I think we could make this a bit more strict and instead of encouraging
>>>>> people to put all their validation code into a Validator<T>, force
>>>>> them to. Which means assuming apis wrap all untrusted stuff in
>>>>> Untrusted<T> reviewers only need to look at validate implementations and
>>>>> nothing else.
>>>>>
>>>>> If I'm not too wrong with my rust I think this would work with slitting
>>>>> Untrusted<T> into two types:
>>>>>
>>>>> - Untrusted<T> is just a box you can't access, and the type returned by
>>>>>   apis for untrusted data. It doesn't have any of the functions except
>>>>>   validate and new_untrusted so that you can't get at the data at all.
>>>>>
>>>>> - UntrustedUnsafe<T> does have all the accessors needed for validation,
>>>>>   but you can't construct that outside of this module. Maybe simplest to
>>>>>   just nest this within the Untrusted<T> box.
>>>>>
>>>>> - Untrusted::validate does the unboxing and passes a reference to
>>>>>   UntrustedUnsafe<T> to Validator::validate, to guarantee that all the
>>>>>   validation code is in there and nowhere else. Similar for
>>>>>   validate_bytes.
>>>>>
>>>>> Of course people can still bypass this easily, but it gets a bit harder to
>>>>> do so, and easier for reviewers to verify everything.
>>>>
>>>> This is a great idea!
>>>> I think we should also remove `validate_bytes`, then you really must
>>>> implement `Validator`. If we figure out later that people need it, we
>>>> can add it later again. I added it because I thought that implementing
>>>> `Validator` for something very small might be very annoying, but it
>>>> seems like you actually want to force people to implement it :)
>>>
>>> See further down, I think there's a real use-case for validate_bytes, or
>>> something really close to it at least.
>>
>> Could you point me to it? I wasn't able to find the use-case you are
>> referring to here.
>>
>> If I remove the `untrusted()` function, then I would think that I also
>> should remove the `validate_bytes` function, since you could just do:
>>
>>     untrusted.validate_bytes(|_| true)
>>
>> Which would give you access to the untrusted data without validation.
>> (of course one could also do this in the `Validate` trait, but I feel
>> like if we have exactly one place where this can happen, it will be a
>> lot easier to catch. We could even have some tool that looks for
>> `Validate` implementations that just return `Ok(untrusted)`)
> 
> So the use-case I've meant is the in-place validation, without copying
> stuff around. So it wouldn't be the specific validate_bytes, but a
> validate_inplace. In my original email I did directly call it that, but it
> took me a w/e to realize that validate_bytes is really just a special-case
> of validate_inplace when T = [u8], and so really we only need
> validate_inplace for every case where we do some in-place validation.

Ahh I see, then we don't need `validate_bytes`.

> And yes validate_inplace would be a very easy minimal escape hatch to just
> unbox the unvalidated data. But fundamentally we can't prevent that, all
> we can do is make sure that the validation code is concentrated in very
> specific places reviewers can easily catch, and |_| true probably not
> being correct validation code is rather easy to spot.

I don't think that it necessarily has to be an escape hatch. I think I
can design an API where you can choose if it will be done in-place vs
copied, while still using the `Validate` trait.

> I do think we need inplace validation for the fairly common case where the
> copy_from_user is split from the validation, e.g. when you copy a big
> chunk of memory which is composed itself of variable-length objects that
> reside within: You'd do one big copy_from_user and then a pile of inplace
> validations for each thing that's in there, maybe with the assistance of
> alice's Range module to make sure you don't split up the overall memory
> area badly into individual pieces.

Yes, will definitely add it.

---
Cheers,
Benno