linux-kernel - Re: [PATCH 1/3] rust: add untrusted data abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9de0ead-2ab4-44c6-b7f5-8bf416799664@proton.me>
Date: Fri, 20 Sep 2024 15:28:47 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Simona Vetter <simona.vetter@...ll.ch>, Greg KH <gregkh@...uxfoundation.org>, Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Wedson Almeida Filho <wedsonaf@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Andreas Hindborg <a.hindborg@...sung.com>, Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] rust: add untrusted data abstraction

On 20.09.24 16:29, Simona Vetter wrote:
> On Wed, Sep 18, 2024 at 03:40:54PM +0000, Benno Lossin wrote:
>> On 16.09.24 17:49, Simona Vetter wrote:
>>> On Fri, Sep 13, 2024 at 04:49:29PM +0000, Benno Lossin wrote:
>>>> What I think we should do instead is make our APIs that return untrusted
>>>> data just return `Untrusted<Folio>` and implement the following method:
>>>>
>>>>     impl Folio {
>>>>         pub fn read(self: &Untrusted<Self>) -> &Untrusted<[u8]>;
>>>>     }
>>>>
>>>> I think that is the best of both worlds: we don't need to do excessive
>>>> type shenanigans for every type carrying potentially untrusted data and
>>>> we get to have methods specific to untrusted data.
>>>>
>>>> However, I think implementing this method will be a bit difficult with
>>>> the `Untrusted`/`Unvalidated` split. Maybe we can have some `pub(crate)`
>>>> methods on `Unvalidated` to perform some mappings?
>>>
>>> The thing is, folios are just a pile of contig pages, and there's nothing
>>> in the rules that they only contain untrusted data. Currently in rust code
>>> we have that's the case, but not in general. So we need that associated
>>> type.
>>>
>>> But I also think Folio here is special, a lot of the other places where I
>>> want this annotation it's the case that the data returned is _always_
>>> untrusted. So we don't need to place associated types all over the
>>> codebase to make this work, it's just that the rfc example you've picked
>>> needs it.
>>
>> I think we should try to make just wrapping stuff in `Untrusted` work. I
>> don't see how the associated types would help you any more than just
>> implementing stuff on `&Untrusted<Self`.
> 
> I guess you could wrap it as Untrusted in each use site when you get the
> data out of the Folio, but that makes the guarantees we get out of these
> annotations much less stringent. Which is why I think for Folio<> (well
> really for Pagecache) we need to go with the associated type or it's a bit
> self-defeating.

Let's just implement both ways and see which one is better.

[...]

>>>>>> +pub trait Validator {
>>>>>> +    /// Type of the input data that is untrusted.
>>>>>> +    type Input: ?Sized;
>>>>>> +    /// Type of the validated data.
>>>>>> +    type Output;
>>>>>
>>>>> So I think the explicit Output makes sense if you have multiple different
>>>>> untrusted input that validate to the same trusted structure, but I'm not
>>>>> sure this makes sense as associated types. Instead I'd go with generics
>>>>> and somethign like this:
>>>>>
>>>>> pub trait Validator<Input: ?Sized> {
>>>>>     type Err;
>>>>>
>>>>>     fn validate(untrusted: &Untrusted<Input>) -> Result<Self, Self::Err>;
>>>>> }
>>>>>
>>>>> That means you can't implement validate for types from other modules
>>>>> directly but need a newtype (I think at least, not sure). But I think
>>>>> that's actually a good thing, since often that means you're validating
>>>>> some generic state plus whatever your own code needs (like the
>>>>> inode::Params<tarfs::INodeData> in your example), and both pieces need to
>>>>> be consisted overall and not just individually (otherwise why does the
>>>>> that other module not do the parsing for you). And so explicitly treating
>>>>> the validated output as an explicit new type just makes sense to me. Plus
>>>>> with derive(Deref) it's trivial to unbox after validation.
>>>>
>>>> There might be the need to validate the same piece of data with
>>>> different ways and I am not convinced adding a newtype for every single
>>>> case is a good way to achieve it.
>>>> Although it would simplify the `Validator` trait... I will think a bit
>>>> about this.
>>>
>>> Hm, but unless I misunderstand you already need a random type to attach
>>> your current trait too? So not worse if we require that for the
>>> less-common type of multiple ways to validate the same, and simpler for
>>> the common one.
>>
>> Yes, but you wouldn't have to unwrap the return type. For example with
>> your proposal we have:
>>
>>     struct MyINodeParams(INodeParams);
>>
>>     impl Validator<[u8]> for MyINodeParams {
>>         type Err = Error;
>>
>>         fn validate(untrusted: &Untrusted<[u8]>) -> Result<Self> {
>>             /*...*/
>>             Ok(Self(params))
>>         }
>>     }
>>
>>     impl MyINodeParams {
>>         fn into_inner(self) -> INodeParams {
>>             self.0
>>         }
>>     }
>>
>> And then you would do:
>>
>>     let params = untrusted.validate::<MyINodeParams>().into_inner();
>>
>> I find the `into_inner()` a bit annoying (one could just use `.0`
>> instead, but I also don't like that). I find specifying the `Output` a
>> bit cleaner.
> 
> Hm right. But I guess with your new plan to only support validate, which
> gets the inner passed in explicitly and returns whatever the closure
> returns?

The only thing that changes with my suggestion is the parameter type of
`validate` (and the names):


    struct MyINodeParams(INodeParams);

    impl Validate<[u8]> for MyINodeParams {
        type Err = Error;

        fn validate(untrusted: &[u8]) -> Result<Self> {
            /* ... */
            Ok(Self(params))
        }
    }

    let params = untrusted.validate::<MyINodeParams>().into_inner();

And with the `Output` type on the trait we would have:

    struct MyINodeParams;

    impl Validate<[u8]> for MyINodeParams {
        type Err = Error;
        type Output = INodeParams;

        fn validate(untrusted: &[u8]) -> Result<Self::Output> {
            // ...
        }
    }

    let params = untrusted.validate::<MyINodeParams>();

I don't think that it's a huge difference, but nonetheless it is
probably useful.

But, I just remembered a probably more important thing: returning
`Result<Self>` will make it possible to use type inference in places
wehre you *do* want your custom type, so

    struct Foo { /* ... */ }
    impl Validate for Foo { /* ... */ }
    fn use_my_foo(foo: Foo) { /* ... */ }
    
    use_my_foo(untrusted.validate()?)

Should work (ie the `.validate()` call doesn't need the generic
argument).

So I think I will go for no `Output` type in the next version.

---
Cheers,
Benno