linux-kernel - Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSZlZu0SNycNvk4q@yury>
Date: Tue, 25 Nov 2025 21:26:46 -0500
From: Yury Norov <yury.norov@...il.com>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Arve Hjønnevåg <arve@...roid.com>,
	Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Christian Brauner <brauner@...nel.org>,
	Carlos Llamas <cmllamas@...gle.com>,
	Suren Baghdasaryan <surenb@...gle.com>, Burak Emir <bqe@...gle.com>,
	Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
	Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <lossin@...nel.org>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>,
	Danilo Krummrich <dakr@...nel.org>, rust-for-linux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> To find an unused Binder handle, Rust Binder currently iterates the
> red/black tree from the beginning until it finds a gap in the keys. This
> is extremely slow.
> 
> To improve the performance, add a bitmap that keeps track of which
> indices are actually in use. This allows us to quickly find an unused
> key in the red/black tree.
> 
> For a benchmark, please see the below numbers that were obtained from
> modifying binderThroughputTest to send a node with each transaction and
> stashing it in the server. This results in the number of nodes
> increasing by one for every transaction sent. I got the following table
> of roundtrip latencies (in µs):
> 
> Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> 0 - 10,000        │          176.88 │         92.93 │          99.41
> 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> 
> It should be clear that the current Rust code becomes linearly slower
> per insertion as the number of calls to rb_next() per transaction
> increases. After this change, the time to find an ID number appears
> constant. (Technically it is not constant-time as both insertion and
> removal scan the entire bitmap. However, quick napkin math shows that
> scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> in a benchmark where the rountrip latency is 100µs.)
> 
> I've included a comparison to the C driver, which uses the same bitmap
> algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> for faster descriptor lookup").

Thanks for the solid numbers!

> This currently checks if the bitmap should be shrunk after every
> removal. One potential future change is introducing a shrinker to make
> this operation O(1), but based on the benchmark above this does not seem
> required at this time.
> 
> Reviewed-by: Burak Emir <bqe@...gle.com>
> Acked-by: Carlos Llamas <cmllamas@...gle.com>
> Signed-off-by: Alice Ryhl <aliceryhl@...gle.com>
> ---
>  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
>  1 file changed, 47 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> --- a/drivers/android/binder/process.rs
> +++ b/drivers/android/binder/process.rs
> @@ -19,6 +19,7 @@
>      cred::Credential,
>      error::Error,
>      fs::file::{self, File},
> +    id_pool::IdPool,
>      list::{List, ListArc, ListArcField, ListLinks},
>      mm,
>      prelude::*,
> @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
>  struct ProcessNodeRefs {
>      /// Used to look up nodes using the 32-bit id that this process knows it by.
>      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> +    /// Used to quickly find unused ids in `by_handle`.
> +    handle_is_present: IdPool,
>      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
>      /// the underlying `Node` struct as returned by `Node::global_id`.
>      by_node: RBTree<usize, u32>,
> @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
>      fn new() -> Self {
>          Self {
>              by_handle: RBTree::new(),
> +            handle_is_present: IdPool::new(),
>              by_node: RBTree::new(),
>              freeze_listeners: RBTree::new(),
>          }
> @@ -775,7 +779,7 @@ pub(crate) fn get_node(
>      pub(crate) fn insert_or_update_handle(
>          self: ArcBorrow<'_, Process>,
>          node_ref: NodeRef,
> -        is_mananger: bool,
> +        is_manager: bool,
>      ) -> Result<u32> {
>          {
>              let mut refs = self.node_refs.lock();
> @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
>          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
>          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
>  
> -        let mut refs = self.node_refs.lock();
> +        let mut refs_lock = self.node_refs.lock();
> +        let mut refs = &mut *refs_lock;
> +
> +        let (unused_id, by_handle_slot) = loop {
> +            // ID 0 may only be used by the manager.
> +            let start = if is_manager { 0 } else { 1 };
> +
> +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> +                match refs.by_handle.entry(res.as_u32()) {
> +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> +                    rbtree::Entry::Occupied(_) => {
> +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> +                        res.acquire();
> +                        kernel::warn_on!(true);
> +                        return Err(EINVAL);

EINVAL means that user provides a wrong parameter. Here's a data
corruption. Maybe EFAULT?

> +                    }
> +                }
> +            }
> +
> +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> +            drop(refs_lock);
> +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> +            refs_lock = self.node_refs.lock();
> +            refs = &mut *refs_lock;
> +            refs.handle_is_present.grow(resizer);

This continues puzzling me. Refs_lock protects refs, and the spec
says:

    a reference’s scope starts from where it is introduced and
    continues through the last time that reference is used.

https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html

The last usage of refs is at .grow_request() line, because later it's
reused with the new value. 

If my reading of the spec is correct, after dropping the refs_lock,
you may get rescheduled, and another thread may follow the same path.
Because refs_lock is dropped explicitly and refs - implicitly, the
concurrent thread can grab both and follow with resizing the id map.

When your first thread will get back, you'll end up resizing the
already resized map.

I asked your AI, and it says that this race is indeed possible for
exactly that reason. But it doesn't break memory safety, so the
compiler is happy about it...

> +        };
> +        let handle = unused_id.as_u32();
>  
>          // Do a lookup again as node may have been inserted before the lock was reacquired.
>          if let Some(handle_ref) = refs.by_node.get(&node_ref.node.global_id()) {
> @@ -804,20 +834,9 @@ pub(crate) fn insert_or_update_handle(
>              return Ok(handle);
>          }
>  
> -        // Find id.
> -        let mut target: u32 = if is_mananger { 0 } else { 1 };
> -        for handle in refs.by_handle.keys() {
> -            if *handle > target {
> -                break;
> -            }
> -            if *handle == target {
> -                target = target.checked_add(1).ok_or(ENOMEM)?;
> -            }
> -        }
> -
>          let gid = node_ref.node.global_id();
>          let (info_proc, info_node) = {
> -            let info_init = NodeRefInfo::new(node_ref, target, self.into());
> +            let info_init = NodeRefInfo::new(node_ref, handle, self.into());
>              match info.pin_init_with(info_init) {
>                  Ok(info) => ListArc::pair_from_pin_unique(info),
>                  // error is infallible
> @@ -838,9 +857,10 @@ pub(crate) fn insert_or_update_handle(
>          // `info_node` into the right node's `refs` list.
>          unsafe { info_proc.node_ref2().node.insert_node_info(info_node) };
>  
> -        refs.by_node.insert(reserve1.into_node(gid, target));
> -        refs.by_handle.insert(reserve2.into_node(target, info_proc));
> -        Ok(target)
> +        refs.by_node.insert(reserve1.into_node(gid, handle));
> +        by_handle_slot.insert(info_proc, reserve2);
> +        unused_id.acquire();
> +        Ok(handle)
>      }
>  
>      pub(crate) fn get_transaction_node(&self, handle: u32) -> BinderResult<NodeRef> {
> @@ -905,6 +925,16 @@ pub(crate) fn update_ref(
>                  let id = info.node_ref().node.global_id();
>                  refs.by_handle.remove(&handle);
>                  refs.by_node.remove(&id);
> +                refs.handle_is_present.release_id(handle as usize);
> +
> +                if let Some(shrink) = refs.handle_is_present.shrink_request() {
> +                    drop(refs);
> +                    // This intentionally ignores allocation failures.
> +                    if let Ok(new_bitmap) = shrink.realloc(GFP_KERNEL) {
> +                        refs = self.node_refs.lock();
> +                        refs.handle_is_present.shrink(new_bitmap);
> +                    }
> +                }
>              }
>          } else {
>              // All refs are cleared in process exit, so this warning is expected in that case.
> 
> -- 
> 2.52.0.460.gd25c4c69ec-goog