linux-kernel - Re: [PATCH v7 0/5] rust: adds Bitmap API, ID pool and bindings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAkq0kQl5WFaW0xM@yury>
Date: Wed, 23 Apr 2025 14:00:50 -0400
From: Yury Norov <yury.norov@...il.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Alice Ryhl <aliceryhl@...gle.com>, Burak Emir <bqe@...gle.com>,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>, rust-for-linux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 0/5] rust: adds Bitmap API, ID pool and bindings

On Wed, Apr 23, 2025 at 10:52:19AM -0700, Boqun Feng wrote:
> On Wed, Apr 23, 2025 at 01:34:22PM -0400, Yury Norov wrote:
> > On Wed, Apr 23, 2025 at 01:11:24PM -0400, Yury Norov wrote:
> > > On Wed, Apr 23, 2025 at 09:30:51AM -0700, Boqun Feng wrote:
> > > > On Wed, Apr 23, 2025 at 06:19:18PM +0200, Alice Ryhl wrote:
> > > > > On Wed, Apr 23, 2025 at 5:43 PM Yury Norov <yury.norov@...il.com> wrote:
> > > > > >
> > > > > > I received it twice - with timestamps 1:36 and 1:43. Assuming they are
> > > > > > identical, and ignoring the former.
> > > > > >
> > > > > > On Wed, Apr 23, 2025 at 01:43:32PM +0000, Burak Emir wrote:
> > > > > > > This series adds a Rust bitmap API for porting the approach from
> > > > > > > commit 15d9da3f818c ("binder: use bitmap for faster descriptor lookup")
> > > > > > > to Rust. The functionality in dbitmap.h makes use of bitmap and bitops.
> > > > > > >
> > > > > > > The Rust bitmap API provides a safe abstraction to underlying bitmap
> > > > > > > and bitops operations. For now, only includes method necessary for
> > > > > > > dbitmap.h, more can be added later. We perform bounds checks for
> > > > > > > hardening, violations are programmer errors that result in panics.
> > > > > > >
> > > > > > > We include set_bit_atomic and clear_bit_atomic operations. One has
> > > > > > > to avoid races with non-atomic operations, which is ensure by the
> > > > > > > Rust type system: either callers have shared references &bitmap in
> > > > > > > which case the mutations are atomic operations. Or there is a
> > > > > > > exclusive reference &mut bitmap, in which case there is no concurrent
> > > > > > > access.
> > > > > >
> > > > > > It's not about shared references only. One can take a mutable
> > > > > > reference, and still may have a race:
> > > > > >
> > > > > > CPU1                            CPU2
> > > > > >
> > > > > > take mut ref
> > > > > > bitmap.set() // non-atomic
> > > > > > put mut ref
> > > > > >                                 take mut ref
> > > > > >                                 bitmap.test() // read as 0
> > > > > > data propagated to memory
> > > > > >                                 bitmap.test() // read as 1
> > > > > >
> > > > > > To make this scenario impossible, either put or take mut ref
> > > > > > should imply global cache flush, because bitmap array is not
> > > > > > an internal data for the Bitmap class (only the pointer is).
> > > > > >
> > > > > > I already asked you to point me to the specification that states that
> > > > > > taking mutable reference implies flushing all the caches to the point
> > > > > > of coherency, but you didn't share it. And I doubt that compiler does
> > > > > > it, for the performance considerations.
> > > > > 
> > > > > The flushing of caches and so on *is* implied. It doesn't happen every
> > > > > time you take a mutable reference, but for you to be able to take a
> > > > > mut ref on CPU2 after releasing it on CPU1, there must be a flush
> > > > > somewhere in between.
> > > > > 
> > > > 
> > > > Yeah, and it's not just "flushing of caches", it's making CPU1's memory
> > > > operations on the object pointed by "mut ref" observable to CPU2. If
> > > > CPU1 and CPU2 sync with the a lock, then lock guarantees that, and if
> > > > CPU1 and CPU2 sync with a store-release+load-acquire, the
> > > > RELEASE-ACQUIRE ordering guarantees that as well.
> > > 
> > > Not sure what you mean. Atomic set_bit() and clear() bit are often
> > > implemented in asm, and there's no acquire-release semantic.
> > 
> > Sorry, hit 'send' preliminary.
> > 
> > > > Yeah, and it's not just "flushing of caches", it's making CPU1's memory
> > > > operations on the object pointed by "mut ref" observable to CPU2. If
> > > > CPU1 and CPU2 sync with the a lock, then lock guarantees that, 
> > 
> > The problem here is that the object pointed by the 'mut ref' is the
> > rust class Bitmap. The class itself allocates an array, which is used
> > as an actual storage. The Rust class and C array will likely not share
> > cache lines.
> > 
> > The pointer is returned from a C call bitmap_zalloc(), so I don't
> > think it's possible for Rust compiler to realize that the number
> > stored in Bitmap is a pointer to data of certain size, and that it
> > should be flushed at "mut ref" put... That's why I guessed a global
> > flush.
> > 
> 
> You don't do the flush in the C code either, right? You would rely on
> some existing synchronization between threads to make sure CPU2 observes
> the memory effect of CPU1 (if that's what you want).
> 
> > Yeah, would be great to understand how this all works.
> > 
> > As a side question: in regular C spinlocks, can you point me to the
> > place where the caches get flushed when a lock moves from CPU1 to
> > CPU2? I spent some time looking at the code, but found nothing myself.
> > Or this implemented in a different way?
> 
> Oh I see, the simple answer would be "the fact that cache flushing is
> done is implied", now let's take a simple example:
> 
> 	CPU 1			CPU 2
> 	=====			=====
> 	spin_lock();
> 	x = 1;
> 	spin_unlock();
> 
> 				spin_lock();
> 				r1 = x;		// r1 == 1
> 				spin_unlock();
> 
> that is, if CPU 2 gets the lock later than CPU 1, r1 is guaranteed to be
> 1, right? Now let's open the box, with a trivial spinlock implementation:
> 
> 	CPU 1			CPU 2
> 	=====			=====
> 	spin_lock();
> 	x = 1;
> 	spin_unlock():
> 	  smp_store_release(lock, 0);
> 
> 				spin_lock():
> 				  while (cmpxchg_acquire(lock, 0, 1) != 0) { }
> 				  
> 				r1 = x;		// r1 == 1
> 				spin_unlock();
> 
> now, for CPU2 to acquire the lock, the cmpxchg_acquire() has to succeed,
> that means a few things:
> 
> 1. 	CPU2 observes the lock value to be 0, i.e CPU2 observes the
> 	store of CPU1 on the lock.
> 
> 2.	Since the smp_store_release() on CPU1, and the cmpxchg_acquire()
> 	on CPU2, it's guaranteed that CPU2 has observed the memory
> 	effect before the smp_store_release() on CPU1. And this is the
> 	"implied" part. In the real hardware cache protocal, what the
> 	smp_store_release() does is basically "flush/invalidate the
> 	cache and issue the store", therefore since CPU2 observes the
> 	store part of the smp_store_release(), it's implied that the
> 	cache flush/invalidate is observed by CPU2 already. Of course
> 	the actual hardware cache protocal is more complicated, but this
> 	is the gist of it.
> 
> Based on 1 & 2, normally a programer won't need to reason about where
> the cache flush is actually issued, but rather the synchronization built
> vi the shared variables (in this case, it's the "lock").
> 
> Hope this could help.

Yeah, that helped a lot. Thank you!

So, if this Rust mutable reference is implemented similarly to a
regular spinlock, I've no more questions.

Thanks again for the explanation.