[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9de0ead-2ab4-44c6-b7f5-8bf416799664@proton.me>
Date: Fri, 20 Sep 2024 15:28:47 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Simona Vetter <simona.vetter@...ll.ch>, Greg KH <gregkh@...uxfoundation.org>, Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Wedson Almeida Filho <wedsonaf@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Andreas Hindborg <a.hindborg@...sung.com>, Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] rust: add untrusted data abstraction
On 20.09.24 16:29, Simona Vetter wrote:
> On Wed, Sep 18, 2024 at 03:40:54PM +0000, Benno Lossin wrote:
>> On 16.09.24 17:49, Simona Vetter wrote:
>>> On Fri, Sep 13, 2024 at 04:49:29PM +0000, Benno Lossin wrote:
>>>> What I think we should do instead is make our APIs that return untrusted
>>>> data just return `Untrusted<Folio>` and implement the following method:
>>>>
>>>> impl Folio {
>>>> pub fn read(self: &Untrusted<Self>) -> &Untrusted<[u8]>;
>>>> }
>>>>
>>>> I think that is the best of both worlds: we don't need to do excessive
>>>> type shenanigans for every type carrying potentially untrusted data and
>>>> we get to have methods specific to untrusted data.
>>>>
>>>> However, I think implementing this method will be a bit difficult with
>>>> the `Untrusted`/`Unvalidated` split. Maybe we can have some `pub(crate)`
>>>> methods on `Unvalidated` to perform some mappings?
>>>
>>> The thing is, folios are just a pile of contig pages, and there's nothing
>>> in the rules that they only contain untrusted data. Currently in rust code
>>> we have that's the case, but not in general. So we need that associated
>>> type.
>>>
>>> But I also think Folio here is special, a lot of the other places where I
>>> want this annotation it's the case that the data returned is _always_
>>> untrusted. So we don't need to place associated types all over the
>>> codebase to make this work, it's just that the rfc example you've picked
>>> needs it.
>>
>> I think we should try to make just wrapping stuff in `Untrusted` work. I
>> don't see how the associated types would help you any more than just
>> implementing stuff on `&Untrusted<Self`.
>
> I guess you could wrap it as Untrusted in each use site when you get the
> data out of the Folio, but that makes the guarantees we get out of these
> annotations much less stringent. Which is why I think for Folio<> (well
> really for Pagecache) we need to go with the associated type or it's a bit
> self-defeating.
Let's just implement both ways and see which one is better.
[...]
>>>>>> +pub trait Validator {
>>>>>> + /// Type of the input data that is untrusted.
>>>>>> + type Input: ?Sized;
>>>>>> + /// Type of the validated data.
>>>>>> + type Output;
>>>>>
>>>>> So I think the explicit Output makes sense if you have multiple different
>>>>> untrusted input that validate to the same trusted structure, but I'm not
>>>>> sure this makes sense as associated types. Instead I'd go with generics
>>>>> and somethign like this:
>>>>>
>>>>> pub trait Validator<Input: ?Sized> {
>>>>> type Err;
>>>>>
>>>>> fn validate(untrusted: &Untrusted<Input>) -> Result<Self, Self::Err>;
>>>>> }
>>>>>
>>>>> That means you can't implement validate for types from other modules
>>>>> directly but need a newtype (I think at least, not sure). But I think
>>>>> that's actually a good thing, since often that means you're validating
>>>>> some generic state plus whatever your own code needs (like the
>>>>> inode::Params<tarfs::INodeData> in your example), and both pieces need to
>>>>> be consisted overall and not just individually (otherwise why does the
>>>>> that other module not do the parsing for you). And so explicitly treating
>>>>> the validated output as an explicit new type just makes sense to me. Plus
>>>>> with derive(Deref) it's trivial to unbox after validation.
>>>>
>>>> There might be the need to validate the same piece of data with
>>>> different ways and I am not convinced adding a newtype for every single
>>>> case is a good way to achieve it.
>>>> Although it would simplify the `Validator` trait... I will think a bit
>>>> about this.
>>>
>>> Hm, but unless I misunderstand you already need a random type to attach
>>> your current trait too? So not worse if we require that for the
>>> less-common type of multiple ways to validate the same, and simpler for
>>> the common one.
>>
>> Yes, but you wouldn't have to unwrap the return type. For example with
>> your proposal we have:
>>
>> struct MyINodeParams(INodeParams);
>>
>> impl Validator<[u8]> for MyINodeParams {
>> type Err = Error;
>>
>> fn validate(untrusted: &Untrusted<[u8]>) -> Result<Self> {
>> /*...*/
>> Ok(Self(params))
>> }
>> }
>>
>> impl MyINodeParams {
>> fn into_inner(self) -> INodeParams {
>> self.0
>> }
>> }
>>
>> And then you would do:
>>
>> let params = untrusted.validate::<MyINodeParams>().into_inner();
>>
>> I find the `into_inner()` a bit annoying (one could just use `.0`
>> instead, but I also don't like that). I find specifying the `Output` a
>> bit cleaner.
>
> Hm right. But I guess with your new plan to only support validate, which
> gets the inner passed in explicitly and returns whatever the closure
> returns?
The only thing that changes with my suggestion is the parameter type of
`validate` (and the names):
struct MyINodeParams(INodeParams);
impl Validate<[u8]> for MyINodeParams {
type Err = Error;
fn validate(untrusted: &[u8]) -> Result<Self> {
/* ... */
Ok(Self(params))
}
}
let params = untrusted.validate::<MyINodeParams>().into_inner();
And with the `Output` type on the trait we would have:
struct MyINodeParams;
impl Validate<[u8]> for MyINodeParams {
type Err = Error;
type Output = INodeParams;
fn validate(untrusted: &[u8]) -> Result<Self::Output> {
// ...
}
}
let params = untrusted.validate::<MyINodeParams>();
I don't think that it's a huge difference, but nonetheless it is
probably useful.
But, I just remembered a probably more important thing: returning
`Result<Self>` will make it possible to use type inference in places
wehre you *do* want your custom type, so
struct Foo { /* ... */ }
impl Validate for Foo { /* ... */ }
fn use_my_foo(foo: Foo) { /* ... */ }
use_my_foo(untrusted.validate()?)
Should work (ie the `.validate()` call doesn't need the generic
argument).
So I think I will go for no `Output` type in the next version.
---
Cheers,
Benno
Powered by blists - more mailing lists