linux-kernel - Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRUZq0Fo6T1f3lOD@google.com>
Date: Wed, 12 Nov 2025 23:35:07 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	"Arve Hjønnevåg" <arve@...roid.com>, Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>, 
	Joel Fernandes <joelagnelf@...dia.com>, Christian Brauner <brauner@...nel.org>, 
	Carlos Llamas <cmllamas@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, Burak Emir <bqe@...gle.com>, 
	Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
	"Björn Roy Baron" <bjorn3_gh@...tonmail.com>, Benno Lossin <lossin@...nel.org>, 
	Andreas Hindborg <a.hindborg@...nel.org>, Trevor Gross <tmgross@...ch.edu>, 
	Danilo Krummrich <dakr@...nel.org>, rust-for-linux@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles

On Wed, Nov 12, 2025 at 02:09:19PM -0500, Yury Norov wrote:
> On Wed, Nov 12, 2025 at 12:47:24PM +0000, Alice Ryhl wrote:
> > To find an unused Binder handle, Rust Binder currently iterates the
> > red/black tree from the beginning until it finds a gap in the keys. This
> > is extremely slow.
> 
> Can you share performance numbers? 

I have not benchmarked it in the Rust driver, but it replaces
potentially thousands of calls to rb_next() with a single call to
find_unused_id(), so I'm feeling good about the performance. And the
equivalent change in the C driver was done because this particular piece
of code was causing contention issues by holding the spinlock for a long
time.

The characteristics of this collection is that there is one per process
using the driver. Most processes have only a few entries in this bitmap,
so the inline representation will apply in most cases. However, there
are a few processes that have a large number of entries in the 4 to
maybe 5 figures range. Those processes are what caused the contention
issue mentioned above.

> > To improve the performance, add a bitmap that keeps track of which
> > indices are actually in use. This allows us to quickly find an unused
> > key in the red/black tree.
> > 
> > This logic matches the approach used by C Binder. It was chosen
> > partially because it's the most memory efficient solution.
> 
> That inaccurate. You are adding a new data structure (bitmap), advocating
> it with an improvement on search side, and that makes sense.
> 
> But now you're saying it's also a more memory efficient approach, which
> doesn't sound trivial because the most memory efficient solution is to
> bring no new data structures at all.
> 
> I guess you meant that bitmap is the most efficient data structure to
> index used/unused nodes. If so, can you please rephrase the sentence?

Yes I can rephrase.

Adding more data is of course always less memory efficient. What I meant
is that this is more memory efficient than the competing solution of
using an augmented rbtree that Carlos mentioned here:

https://lore.kernel.org/p/aC1PQ7tmcqMSmbHc@google.com

> > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > +                match refs.by_handle.entry(res.as_u32()) {
> > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > +                    rbtree::Entry::Occupied(_) => {
> > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > +                        res.acquire();
> > +                        continue;
> 
> At this point you've detected mismatch between two linked data
> structures. It means that one of them or both are corrupted. To
> me it looks fatal, and your pr_err() confirms it. How could you
> continue then?

Although we should never hit this codepath in real code, I don't think
we need to kill the kernel. We can treat the r/b tree as source of truth
and adjust the bitmap when mismathces are detected.

I could add a kernel warning, though. That shouldn't kill an Android
device.

> > +                    }
> > +                }
> > +            }
> > +
> > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > +            drop(refs_lock);
> > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > +            refs_lock = self.node_refs.lock();
> > +            refs = &mut *refs_lock;
> > +            refs.handle_is_present.grow(resizer);
> 
> Is it possible to turn this block into a less wordy statement? Maybe a
> wrapper function for it? Ideally, the grow request should be handled
> transparently in .find_unused_id().

I can extract this block into a separate function, but I think it would
be tricky to move the entire logic inside .find_unused_id() because of
the mutex lock/unlock situation.

> > @@ -905,6 +924,16 @@ pub(crate) fn update_ref(
> >                  let id = info.node_ref().node.global_id();
> >                  refs.by_handle.remove(&handle);
> >                  refs.by_node.remove(&id);
> > +                refs.handle_is_present.release_id(handle as usize);
> > +
> > +                if let Some(shrink) = refs.handle_is_present.shrink_request() {
> 
> This is questionable. Shrinking is usually the very slow path, and we
> don't shrink unless we're really close or even inside the OOM condition.
> 
> In this case, shrink_request() on average returns false, but it's
> O(N), which makes _every_ release_id() O(N), while it should be O(1).

The current implementation of shrink_request() will refuse to shrink the
pool unless the largest bit is less than 1/4 of the capacity, so it
should not perform the expensive operation very often.

That said, it does call find_last_bit() each time, which I guess is
O(N). But my assumption was that find_last_bit() is cheap enough that it
wouldn't be a problem.

> Consider a very realistic case: you're destroying every object, and thus
> removing every ID in the associate ID pool, doing it in LIFO order. That
> way you will need to call shrink_request() about O(log(N)) times, making
> the whole complexity ~O(N*log(N)); and you'll have to make log(N)
> realloc()s for nothing. If you release IDs in FIFO order, you don't
> call realloc(), but your shrink_request() total complexity will be O(N^2). 

Even if we end up making log(N) reallocs, the total complexity of the
reallocs is O(N) because the amount of data being reallocd halves each
time. So the total number of bytes copied by reallocs ends up being:

    1 + 2 + 4 + 8 + ... + 2^log(N) <= 2^(1+log(N)) = 2*N

which is O(N).

Of course, deleting the corresponding entry from the red/black tree is
O(log N), so it's still O(N*log(N)) for the N deletions from the rb
tree.

> Can you compare performance numbers with and without shrinking under a
> typical payload? Is there any mechanism to inspect the ID pools at runtime,
> like expose via procfs?

We expose the contents of the red/black tree via the binder_logs
mechanism, but that doesn't show the *capacity* of the bitmap. Only the
index of the largest set bit.

> Can you make your shrinking logic conditional on some reasonable OOM
> heuristics, maybe OOM event driven?
> 
> And even without OOM, you can safely skip shrinking if the number of IDs in
> use is greater than 1/4 of the capacity, or there's any used ID with
> the index greater than the 1/2 capacity.

I guess we could register a shrinker, but I don't think it's worth it.

Thanks for the review!

Alice