linux-kernel - Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D9C61DDI99JX.31T59XPQGYBB1@nvidia.com>
Date: Mon, 21 Apr 2025 17:15:29 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Alexandre Courbot" <acourbot@...dia.com>, "Danilo Krummrich"
 <dakr@...nel.org>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
 <alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
 <gary@...yguo.net>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <benno.lossin@...ton.me>,
 "Andreas Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl"
 <aliceryhl@...gle.com>, "Trevor Gross" <tmgross@...ch.edu>, "Joel
 Fernandes" <joelagnelf@...dia.com>, "John Hubbard" <jhubbard@...dia.com>,
 <rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`

On Tue Apr 8, 2025 at 10:34 PM JST, Alexandre Courbot wrote:
> On Mon Apr 7, 2025 at 8:01 PM JST, Danilo Krummrich wrote:
>>> +    /// Extends the vector by the elements of `iter`.
>>> +    ///
>>> +    /// This uses [`Iterator::size_hint`] to optimize memory reallocations, but will work even with
>>> +    /// imprecise implementations - albeit in a non-optimal way.
>>> +    ///
>>> +    /// This method returns an error if a memory reallocation required to accommodate the new items
>>> +    /// failed. In this case, callers must assume that some (but not all) elements of `iter` might
>>> +    /// have been added to the vector.
>>> +    ///
>>> +    /// # Note on optimal behavior and correctness
>>> +    ///
>>> +    /// The efficiency of this method depends on how reliable the [`Iterator::size_hint`]
>>> +    /// implementation of the `iter` is.
>>> +    ///
>>> +    /// It performs optimally with at most a single memory reallocation if the lower bound of
>>> +    /// `size_hint` is the exact number of items actually yielded.
>>> +    ///
>>> +    /// If `size_hint` is more vague, there may be as many memory reallocations as necessary to
>>> +    /// cover the whole iterator from the successive lower bounds returned by `size_hint`.
>>> +    ///
>>> +    /// If `size_hint` signals more items than actually yielded by the iterator, some unused memory
>>> +    /// might be reserved.
>>> +    ///
>>> +    /// Finally, whenever `size_hint` returns `(0, Some(0))`, the method assumes that no more items
>>> +    /// are yielded by the iterator and returns. This may result in some items not being added if
>>> +    /// there were still some remaining.
>>> +    ///
>>> +    /// In the kernel most iterators are expected to have a precise and correct `size_hint`
>>> +    /// implementation, so this should nicely optimize out for these cases.
>>
>> I agree, hence I think we should enforce to be provided with a guaranteed
>> correct size hint and simplify the code. I think we should extend the signature.
>>
>>      pub fn extend<I>(&mut self, iter: I, flags: Flags) -> Result<(), AllocError>
>>      where
>>          I: IntoIterator<Item = T>,
>>          I::IntoIter: ExactSizeIterator,
>>
>> And implement ExactSizeIterator for IntoIter.
>>
>> The only thing that bothers me a bit is that the documentation [1] of
>> ExactSizeIterator sounds a bit ambiguous.
>>
>> It says: "When implementing an ExactSizeIterator, you must also implement
>> Iterator. When doing so, the implementation of Iterator::size_hint *must*
>> return the exact size of the iterator."
>>
>> But it also says: "Note that this trait is a safe trait and as such does not and
>> cannot guarantee that the returned length is correct. This means that unsafe
>> code must not rely on the correctness of Iterator::size_hint. The unstable and
>> unsafe TrustedLen trait gives this additional guarantee."
>
> Yeah ExactSizeIterator is not the solution to this, since it can be
> implemented without an unsafe block and the implementer is perfectly
> free to provide an incorrect value - so we cannot trust its result.
>
>>
>> Acknowledging the latter, I think we should implement our own trait for this
>> instead. Our own version of TrustedLen seems reasonable to me.
>
> That sounds reasonable and would greatly simplify the code (and remove
> most of my fears about its optimization). Let me explore that direction.

Well, that turned out to be an interesting rabbit hole.

Leveraging the existing traits seems a bit difficult:

- `ExactSizeIterator` cannot be implemented for adapters that increase the
  length of their iterators, because if one of them is already `usize::MAX` long
  then the size wouldn't be exact anymore. [1]

- And `TrustedLen` cannot be implemented for adapters that make an iterator
  shorter, because if the iterator returns more than `usize::MAX` items (i.e.
  has an upper bound set to `None`) then the adapter can't predict the actual
  length. [2]

So in both cases, the model breaks at the limit. OTOH, in our case we want to
gather items into some collection, meaning that we are quite unlikely to ever
reach that limit, as doing so would likely trigger an OOM anyway.

Which means that we need to come with our own unsafe trait
(`ExactSizeCollectible`?), which will have its own limits. It shall only be
used to collect things (because we are unlikely to reach a size of `usize::MAX`
in that context), and will take the lower bound of `size_hint` at face value,
meaning it might collect less than the whole collection if the lower bound of
the iterator or one of its adapters ever reaches `usize::MAX`. Again in the
context of a collection this should never happen, but it's still a limitation.

If we can live with this, then with a bit of code (because the new trait would
need to be implemented for every iterator and adapter we want to collect out
there) we should be able to provide an efficient, one-pass `collect()` method.

Thoughts?

[1] https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#when-shouldnt-an-adapter-be-exactsizeiterator
[2] https://doc.rust-lang.org/core/iter/trait.TrustedLen.html#when-shouldnt-an-adapter-be-trustedlen