linux-kernel - Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRYaYlX1vQG_GUAy@yury>
Date: Thu, 13 Nov 2025 12:50:37 -0500
From: Yury Norov <yury.norov@...il.com>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Arve Hjønnevåg <arve@...roid.com>,
	Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Christian Brauner <brauner@...nel.org>,
	Carlos Llamas <cmllamas@...gle.com>,
	Suren Baghdasaryan <surenb@...gle.com>, Burak Emir <bqe@...gle.com>,
	Miguel Ojeda <ojeda@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
	Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <lossin@...nel.org>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>,
	Danilo Krummrich <dakr@...nel.org>, rust-for-linux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles

On Wed, Nov 12, 2025 at 11:35:07PM +0000, Alice Ryhl wrote:
> On Wed, Nov 12, 2025 at 02:09:19PM -0500, Yury Norov wrote:
> > On Wed, Nov 12, 2025 at 12:47:24PM +0000, Alice Ryhl wrote:
> > > To find an unused Binder handle, Rust Binder currently iterates the
> > > red/black tree from the beginning until it finds a gap in the keys. This
> > > is extremely slow.
> > 
> > Can you share performance numbers? 
> 
> I have not benchmarked it in the Rust driver, but it replaces
> potentially thousands of calls to rb_next() with a single call to
> find_unused_id(), so I'm feeling good about the performance. And the

Feelings are good, but numbers are better. In the original dbitmap
patch, Carlos shared the experiment details and rough numbers, and
it's enough for me. Can you just reproduce his steps?

> equivalent change in the C driver was done because this particular piece
> of code was causing contention issues by holding the spinlock for a long
> time.
> 
> The characteristics of this collection is that there is one per process
> using the driver. Most processes have only a few entries in this bitmap,
> so the inline representation will apply in most cases. However, there
> are a few processes that have a large number of entries in the 4 to
> maybe 5 figures range. Those processes are what caused the contention
> issue mentioned above.
> 
> > > To improve the performance, add a bitmap that keeps track of which
> > > indices are actually in use. This allows us to quickly find an unused
> > > key in the red/black tree.
> > > 
> > > This logic matches the approach used by C Binder. It was chosen
> > > partially because it's the most memory efficient solution.
> > 
> > That inaccurate. You are adding a new data structure (bitmap), advocating
> > it with an improvement on search side, and that makes sense.
> > 
> > But now you're saying it's also a more memory efficient approach, which
> > doesn't sound trivial because the most memory efficient solution is to
> > bring no new data structures at all.
> > 
> > I guess you meant that bitmap is the most efficient data structure to
> > index used/unused nodes. If so, can you please rephrase the sentence?
> 
> Yes I can rephrase.
> 
> Adding more data is of course always less memory efficient. What I meant
> is that this is more memory efficient than the competing solution of
> using an augmented rbtree that Carlos mentioned here:
> 
> https://lore.kernel.org/p/aC1PQ7tmcqMSmbHc@google.com
> 
> > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > +                match refs.by_handle.entry(res.as_u32()) {
> > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > +                    rbtree::Entry::Occupied(_) => {
> > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > +                        res.acquire();
> > > +                        continue;
> > 
> > At this point you've detected mismatch between two linked data
> > structures. It means that one of them or both are corrupted. To
> > me it looks fatal, and your pr_err() confirms it. How could you
> > continue then?
> 
> Although we should never hit this codepath in real code, I don't think
> we need to kill the kernel. We can treat the r/b tree as source of truth
> and adjust the bitmap when mismathces are detected.

There's no such thing like a 'source of truth', and rb-tree is not even
close to that.

You add bitmap for performance reasons, but with that you also bring
some redundancy. Now, you cross-check two data structures and see
discrepancy. At this point you can only say that either one of them
or both are corrupted.

Relying on rb-tree over bitmap is simply wrong. Statistically
speaking, there's more chances to corrupt rb-tree - just because it
is scattered and takes more memory.

> I could add a kernel warning, though. That shouldn't kill an Android
> device.

Assuming, you're talking about WARN(), I agree. But it looks like my
reasoning differs.
 
You never hit this path during a normal operation, as you said. So if
you're there, it means that something is already going wrong. If you
issue WARN(), you let those focused on system integrity to leverage
panic_on_warn and shut the system down to minimize any possible damage. 

With that precaution, you're free to do whatever you find practical to
'recover', or even do nothing. But please avoid referring rb-tree as a
source of truth - it's a misleading and dangerous misconception.

> > > +                    }
> > > +                }
> > > +            }
> > > +
> > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > +            drop(refs_lock);
> > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > +            refs_lock = self.node_refs.lock();
> > > +            refs = &mut *refs_lock;
> > > +            refs.handle_is_present.grow(resizer);
> > 
> > Is it possible to turn this block into a less wordy statement? Maybe a
> > wrapper function for it? Ideally, the grow request should be handled
> > transparently in .find_unused_id().
> 
> I can extract this block into a separate function, but I think it would
> be tricky to move the entire logic inside .find_unused_id() because of
> the mutex lock/unlock situation.
> 
> > > @@ -905,6 +924,16 @@ pub(crate) fn update_ref(
> > >                  let id = info.node_ref().node.global_id();
> > >                  refs.by_handle.remove(&handle);
> > >                  refs.by_node.remove(&id);
> > > +                refs.handle_is_present.release_id(handle as usize);
> > > +
> > > +                if let Some(shrink) = refs.handle_is_present.shrink_request() {
> > 
> > This is questionable. Shrinking is usually the very slow path, and we
> > don't shrink unless we're really close or even inside the OOM condition.
> > 
> > In this case, shrink_request() on average returns false, but it's
> > O(N), which makes _every_ release_id() O(N), while it should be O(1).
> 
> The current implementation of shrink_request() will refuse to shrink the
> pool unless the largest bit is less than 1/4 of the capacity, so it
> should not perform the expensive operation very often.
> 
> That said, it does call find_last_bit() each time, which I guess is
> O(N). But my assumption was that find_last_bit() is cheap enough that it
> wouldn't be a problem.

It's O(N), while the expectation for release_id() is to be O(1). But if
you think it's OK for you - I'm OK as well. Can you explicitly mention
it in function description?

> > Consider a very realistic case: you're destroying every object, and thus
> > removing every ID in the associate ID pool, doing it in LIFO order. That
> > way you will need to call shrink_request() about O(log(N)) times, making
> > the whole complexity ~O(N*log(N)); and you'll have to make log(N)
> > realloc()s for nothing. If you release IDs in FIFO order, you don't
> > call realloc(), but your shrink_request() total complexity will be O(N^2). 
> 
> Even if we end up making log(N) reallocs, the total complexity of the
> reallocs is O(N) because the amount of data being reallocd halves each
> time. So the total number of bytes copied by reallocs ends up being:
> 
>     1 + 2 + 4 + 8 + ... + 2^log(N) <= 2^(1+log(N)) = 2*N
> 
> which is O(N).

OK, I trust your math better than mine. :)

> Of course, deleting the corresponding entry from the red/black tree is
> O(log N), so it's still O(N*log(N)) for the N deletions from the rb
> tree.
> 
> > Can you compare performance numbers with and without shrinking under a
> > typical payload? Is there any mechanism to inspect the ID pools at runtime,
> > like expose via procfs?
> 
> We expose the contents of the red/black tree via the binder_logs
> mechanism, but that doesn't show the *capacity* of the bitmap. Only the
> index of the largest set bit.
> 
> > Can you make your shrinking logic conditional on some reasonable OOM
> > heuristics, maybe OOM event driven?
> > 
> > And even without OOM, you can safely skip shrinking if the number of IDs in
> > use is greater than 1/4 of the capacity, or there's any used ID with
> > the index greater than the 1/2 capacity.
> 
> I guess we could register a shrinker, but I don't think it's worth it.

OK, if you're fine with O(N) for release_id() - I'm fine as well. Can
you mention OOM-driven shrinking as a possible alternative for the
following improvements?

Thanks,
Yury