lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAfL-e6qA9oBce5t@cassiopeiae>
Date: Tue, 22 Apr 2025 19:03:53 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Alexandre Courbot <acourbot@...dia.com>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	Joel Fernandes <joelagnelf@...dia.com>,
	John Hubbard <jhubbard@...dia.com>, rust-for-linux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`

On Mon, Apr 21, 2025 at 05:15:29PM +0900, Alexandre Courbot wrote:
> On Tue Apr 8, 2025 at 10:34 PM JST, Alexandre Courbot wrote:
> > On Mon Apr 7, 2025 at 8:01 PM JST, Danilo Krummrich wrote:
> >>> +    /// Extends the vector by the elements of `iter`.
> >>> +    ///
> >>> +    /// This uses [`Iterator::size_hint`] to optimize memory reallocations, but will work even with
> >>> +    /// imprecise implementations - albeit in a non-optimal way.
> >>> +    ///
> >>> +    /// This method returns an error if a memory reallocation required to accommodate the new items
> >>> +    /// failed. In this case, callers must assume that some (but not all) elements of `iter` might
> >>> +    /// have been added to the vector.
> >>> +    ///
> >>> +    /// # Note on optimal behavior and correctness
> >>> +    ///
> >>> +    /// The efficiency of this method depends on how reliable the [`Iterator::size_hint`]
> >>> +    /// implementation of the `iter` is.
> >>> +    ///
> >>> +    /// It performs optimally with at most a single memory reallocation if the lower bound of
> >>> +    /// `size_hint` is the exact number of items actually yielded.
> >>> +    ///
> >>> +    /// If `size_hint` is more vague, there may be as many memory reallocations as necessary to
> >>> +    /// cover the whole iterator from the successive lower bounds returned by `size_hint`.
> >>> +    ///
> >>> +    /// If `size_hint` signals more items than actually yielded by the iterator, some unused memory
> >>> +    /// might be reserved.
> >>> +    ///
> >>> +    /// Finally, whenever `size_hint` returns `(0, Some(0))`, the method assumes that no more items
> >>> +    /// are yielded by the iterator and returns. This may result in some items not being added if
> >>> +    /// there were still some remaining.
> >>> +    ///
> >>> +    /// In the kernel most iterators are expected to have a precise and correct `size_hint`
> >>> +    /// implementation, so this should nicely optimize out for these cases.
> >>
> >> I agree, hence I think we should enforce to be provided with a guaranteed
> >> correct size hint and simplify the code. I think we should extend the signature.
> >>
> >>      pub fn extend<I>(&mut self, iter: I, flags: Flags) -> Result<(), AllocError>
> >>      where
> >>          I: IntoIterator<Item = T>,
> >>          I::IntoIter: ExactSizeIterator,
> >>
> >> And implement ExactSizeIterator for IntoIter.
> >>
> >> The only thing that bothers me a bit is that the documentation [1] of
> >> ExactSizeIterator sounds a bit ambiguous.
> >>
> >> It says: "When implementing an ExactSizeIterator, you must also implement
> >> Iterator. When doing so, the implementation of Iterator::size_hint *must*
> >> return the exact size of the iterator."
> >>
> >> But it also says: "Note that this trait is a safe trait and as such does not and
> >> cannot guarantee that the returned length is correct. This means that unsafe
> >> code must not rely on the correctness of Iterator::size_hint. The unstable and
> >> unsafe TrustedLen trait gives this additional guarantee."
> >
> > Yeah ExactSizeIterator is not the solution to this, since it can be
> > implemented without an unsafe block and the implementer is perfectly
> > free to provide an incorrect value - so we cannot trust its result.
> >
> >>
> >> Acknowledging the latter, I think we should implement our own trait for this
> >> instead. Our own version of TrustedLen seems reasonable to me.
> >
> > That sounds reasonable and would greatly simplify the code (and remove
> > most of my fears about its optimization). Let me explore that direction.
> 
> Well, that turned out to be an interesting rabbit hole.
> 
> Leveraging the existing traits seems a bit difficult:
> 
> - `ExactSizeIterator` cannot be implemented for adapters that increase the
>   length of their iterators, because if one of them is already `usize::MAX` long
>   then the size wouldn't be exact anymore. [1]
> 
> - And `TrustedLen` cannot be implemented for adapters that make an iterator
>   shorter, because if the iterator returns more than `usize::MAX` items (i.e.
>   has an upper bound set to `None`) then the adapter can't predict the actual
>   length. [2]

Why is this a problem for the above implementation of Vec::extend()?

I just looked it up and it seems that std [1] does the same thing. Do I miss
anything?

[1] https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/spec_extend.rs#L25

> 
> So in both cases, the model breaks at the limit. OTOH, in our case we want to
> gather items into some collection, meaning that we are quite unlikely to ever
> reach that limit, as doing so would likely trigger an OOM anyway.
> 
> Which means that we need to come with our own unsafe trait
> (`ExactSizeCollectible`?), which will have its own limits. It shall only be
> used to collect things (because we are unlikely to reach a size of `usize::MAX`
> in that context), and will take the lower bound of `size_hint` at face value,
> meaning it might collect less than the whole collection if the lower bound of
> the iterator or one of its adapters ever reaches `usize::MAX`. Again in the
> context of a collection this should never happen, but it's still a limitation.
> 
> If we can live with this, then with a bit of code (because the new trait would
> need to be implemented for every iterator and adapter we want to collect out
> there) we should be able to provide an efficient, one-pass `collect()` method.
> 
> Thoughts?
> 
> [1] https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#when-shouldnt-an-adapter-be-exactsizeiterator
> [2] https://doc.rust-lang.org/core/iter/trait.TrustedLen.html#when-shouldnt-an-adapter-be-trustedlen
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ