[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aR5P0p-7SQ8dGrmz@google.com>
Date: Wed, 19 Nov 2025 23:16:34 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Arve Hjønnevåg" <arve@...roid.com>, Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
Joel Fernandes <joelagnelf@...dia.com>, Christian Brauner <brauner@...nel.org>,
Carlos Llamas <cmllamas@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, Burak Emir <bqe@...gle.com>,
Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
"Björn Roy Baron" <bjorn3_gh@...tonmail.com>, Benno Lossin <lossin@...nel.org>,
Andreas Hindborg <a.hindborg@...nel.org>, Trevor Gross <tmgross@...ch.edu>,
Danilo Krummrich <dakr@...nel.org>, rust-for-linux@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles
On Thu, Nov 13, 2025 at 12:50:37PM -0500, Yury Norov wrote:
> On Wed, Nov 12, 2025 at 11:35:07PM +0000, Alice Ryhl wrote:
> > On Wed, Nov 12, 2025 at 02:09:19PM -0500, Yury Norov wrote:
> > > On Wed, Nov 12, 2025 at 12:47:24PM +0000, Alice Ryhl wrote:
> > > > To find an unused Binder handle, Rust Binder currently iterates the
> > > > red/black tree from the beginning until it finds a gap in the keys. This
> > > > is extremely slow.
> > >
> > > Can you share performance numbers?
> >
> > I have not benchmarked it in the Rust driver, but it replaces
> > potentially thousands of calls to rb_next() with a single call to
> > find_unused_id(), so I'm feeling good about the performance. And the
>
> Feelings are good, but numbers are better. In the original dbitmap
> patch, Carlos shared the experiment details and rough numbers, and
> it's enough for me. Can you just reproduce his steps?
Unfortunately Carlos doesn't have his benchmark files anymore, but I
made my own benchmark. It's a test where a client repeatedly sends a
transaction containing a node to a server, and the server stashes it
forever. This way, the number of nodes keeps growing forever. (Up to
100k nodes.)
Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
0 - 10,000 │ 176.88 │ 92.93 │ 99.41
10,000 - 20,000 │ 437.37 │ 87.74 │ 98.55
20,000 - 30,000 │ 677.49 │ 76.24 │ 96.37
30,000 - 40,000 │ 901.76 │ 83.39 │ 96.73
40,000 - 50,000 │ 1126.62 │ 100.44 │ 94.57
50,000 - 60,000 │ 1288.98 │ 94.38 │ 96.64
60,000 - 70,000 │ 1588.74 │ 88.27 │ 96.36
70,000 - 80,000 │ 1812.97 │ 93.97 │ 91.24
80,000 - 90,000 │ 2062.95 │ 92.22 │ 102.01
90,000 - 100,000 │ 2330.03 │ 97.18 │ 100.31
The numbers measure the roundtrip latency in microseconds that it takes
to send the transaction containing the node, and to receive the
response.
I would not necessarily trust the precise numbers. Rust seems slightly
better here, but I ran it on an emulator so it may be noisy. Regardless,
I think it's pretty clear that an improvement has happened.
> > equivalent change in the C driver was done because this particular piece
> > of code was causing contention issues by holding the spinlock for a long
> > time.
> >
> > The characteristics of this collection is that there is one per process
> > using the driver. Most processes have only a few entries in this bitmap,
> > so the inline representation will apply in most cases. However, there
> > are a few processes that have a large number of entries in the 4 to
> > maybe 5 figures range. Those processes are what caused the contention
> > issue mentioned above.
> >
> > > > To improve the performance, add a bitmap that keeps track of which
> > > > indices are actually in use. This allows us to quickly find an unused
> > > > key in the red/black tree.
> > > >
> > > > This logic matches the approach used by C Binder. It was chosen
> > > > partially because it's the most memory efficient solution.
> > >
> > > That inaccurate. You are adding a new data structure (bitmap), advocating
> > > it with an improvement on search side, and that makes sense.
> > >
> > > But now you're saying it's also a more memory efficient approach, which
> > > doesn't sound trivial because the most memory efficient solution is to
> > > bring no new data structures at all.
> > >
> > > I guess you meant that bitmap is the most efficient data structure to
> > > index used/unused nodes. If so, can you please rephrase the sentence?
> >
> > Yes I can rephrase.
> >
> > Adding more data is of course always less memory efficient. What I meant
> > is that this is more memory efficient than the competing solution of
> > using an augmented rbtree that Carlos mentioned here:
> >
> > https://lore.kernel.org/p/aC1PQ7tmcqMSmbHc@google.com
> >
> > > > + if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > > + match refs.by_handle.entry(res.as_u32()) {
> > > > + rbtree::Entry::Vacant(entry) => break (res, entry),
> > > > + rbtree::Entry::Occupied(_) => {
> > > > + pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > > + res.acquire();
> > > > + continue;
> > >
> > > At this point you've detected mismatch between two linked data
> > > structures. It means that one of them or both are corrupted. To
> > > me it looks fatal, and your pr_err() confirms it. How could you
> > > continue then?
> >
> > Although we should never hit this codepath in real code, I don't think
> > we need to kill the kernel. We can treat the r/b tree as source of truth
> > and adjust the bitmap when mismathces are detected.
>
> There's no such thing like a 'source of truth', and rb-tree is not even
> close to that.
>
> You add bitmap for performance reasons, but with that you also bring
> some redundancy. Now, you cross-check two data structures and see
> discrepancy. At this point you can only say that either one of them
> or both are corrupted.
>
> Relying on rb-tree over bitmap is simply wrong. Statistically
> speaking, there's more chances to corrupt rb-tree - just because it
> is scattered and takes more memory.
>
> > I could add a kernel warning, though. That shouldn't kill an Android
> > device.
>
> Assuming, you're talking about WARN(), I agree. But it looks like my
> reasoning differs.
>
> You never hit this path during a normal operation, as you said. So if
> you're there, it means that something is already going wrong. If you
> issue WARN(), you let those focused on system integrity to leverage
> panic_on_warn and shut the system down to minimize any possible damage.
>
> With that precaution, you're free to do whatever you find practical to
> 'recover', or even do nothing. But please avoid referring rb-tree as a
> source of truth - it's a misleading and dangerous misconception.
Ok.
I picked the rb-tree because using the bitmap isn't possible - it
doesn't store the auxiliary data that we need.
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > + let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > > + drop(refs_lock);
> > > > + let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > > + refs_lock = self.node_refs.lock();
> > > > + refs = &mut *refs_lock;
> > > > + refs.handle_is_present.grow(resizer);
> > >
> > > Is it possible to turn this block into a less wordy statement? Maybe a
> > > wrapper function for it? Ideally, the grow request should be handled
> > > transparently in .find_unused_id().
> >
> > I can extract this block into a separate function, but I think it would
> > be tricky to move the entire logic inside .find_unused_id() because of
> > the mutex lock/unlock situation.
> >
> > > > @@ -905,6 +924,16 @@ pub(crate) fn update_ref(
> > > > let id = info.node_ref().node.global_id();
> > > > refs.by_handle.remove(&handle);
> > > > refs.by_node.remove(&id);
> > > > + refs.handle_is_present.release_id(handle as usize);
> > > > +
> > > > + if let Some(shrink) = refs.handle_is_present.shrink_request() {
> > >
> > > This is questionable. Shrinking is usually the very slow path, and we
> > > don't shrink unless we're really close or even inside the OOM condition.
> > >
> > > In this case, shrink_request() on average returns false, but it's
> > > O(N), which makes _every_ release_id() O(N), while it should be O(1).
> >
> > The current implementation of shrink_request() will refuse to shrink the
> > pool unless the largest bit is less than 1/4 of the capacity, so it
> > should not perform the expensive operation very often.
> >
> > That said, it does call find_last_bit() each time, which I guess is
> > O(N). But my assumption was that find_last_bit() is cheap enough that it
> > wouldn't be a problem.
>
> It's O(N), while the expectation for release_id() is to be O(1). But if
> you think it's OK for you - I'm OK as well. Can you explicitly mention
> it in function description?
Sure I will mention it.
> > > Consider a very realistic case: you're destroying every object, and thus
> > > removing every ID in the associate ID pool, doing it in LIFO order. That
> > > way you will need to call shrink_request() about O(log(N)) times, making
> > > the whole complexity ~O(N*log(N)); and you'll have to make log(N)
> > > realloc()s for nothing. If you release IDs in FIFO order, you don't
> > > call realloc(), but your shrink_request() total complexity will be O(N^2).
> >
> > Even if we end up making log(N) reallocs, the total complexity of the
> > reallocs is O(N) because the amount of data being reallocd halves each
> > time. So the total number of bytes copied by reallocs ends up being:
> >
> > 1 + 2 + 4 + 8 + ... + 2^log(N) <= 2^(1+log(N)) = 2*N
> >
> > which is O(N).
>
> OK, I trust your math better than mine. :)
>
> > Of course, deleting the corresponding entry from the red/black tree is
> > O(log N), so it's still O(N*log(N)) for the N deletions from the rb
> > tree.
> >
> > > Can you compare performance numbers with and without shrinking under a
> > > typical payload? Is there any mechanism to inspect the ID pools at runtime,
> > > like expose via procfs?
> >
> > We expose the contents of the red/black tree via the binder_logs
> > mechanism, but that doesn't show the *capacity* of the bitmap. Only the
> > index of the largest set bit.
> >
> > > Can you make your shrinking logic conditional on some reasonable OOM
> > > heuristics, maybe OOM event driven?
> > >
> > > And even without OOM, you can safely skip shrinking if the number of IDs in
> > > use is greater than 1/4 of the capacity, or there's any used ID with
> > > the index greater than the 1/2 capacity.
> >
> > I guess we could register a shrinker, but I don't think it's worth it.
>
> OK, if you're fine with O(N) for release_id() - I'm fine as well. Can
> you mention OOM-driven shrinking as a possible alternative for the
> following improvements?
Sure.
Alice
Powered by blists - more mailing lists