linux-kernel - Re: [PATCH v7 0/5] rust: adds Bitmap API, ID pool and bindings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <680928d5.050a0220.2b8494.c4d2@mx.google.com>
Date: Wed, 23 Apr 2025 10:52:19 -0700
From: Boqun Feng <boqun.feng@...il.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Alice Ryhl <aliceryhl@...gle.com>, Burak Emir <bqe@...gle.com>,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Trevor Gross <tmgross@...ch.edu>, rust-for-linux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 0/5] rust: adds Bitmap API, ID pool and bindings

On Wed, Apr 23, 2025 at 01:34:22PM -0400, Yury Norov wrote:
> On Wed, Apr 23, 2025 at 01:11:24PM -0400, Yury Norov wrote:
> > On Wed, Apr 23, 2025 at 09:30:51AM -0700, Boqun Feng wrote:
> > > On Wed, Apr 23, 2025 at 06:19:18PM +0200, Alice Ryhl wrote:
> > > > On Wed, Apr 23, 2025 at 5:43 PM Yury Norov <yury.norov@...il.com> wrote:
> > > > >
> > > > > I received it twice - with timestamps 1:36 and 1:43. Assuming they are
> > > > > identical, and ignoring the former.
> > > > >
> > > > > On Wed, Apr 23, 2025 at 01:43:32PM +0000, Burak Emir wrote:
> > > > > > This series adds a Rust bitmap API for porting the approach from
> > > > > > commit 15d9da3f818c ("binder: use bitmap for faster descriptor lookup")
> > > > > > to Rust. The functionality in dbitmap.h makes use of bitmap and bitops.
> > > > > >
> > > > > > The Rust bitmap API provides a safe abstraction to underlying bitmap
> > > > > > and bitops operations. For now, only includes method necessary for
> > > > > > dbitmap.h, more can be added later. We perform bounds checks for
> > > > > > hardening, violations are programmer errors that result in panics.
> > > > > >
> > > > > > We include set_bit_atomic and clear_bit_atomic operations. One has
> > > > > > to avoid races with non-atomic operations, which is ensure by the
> > > > > > Rust type system: either callers have shared references &bitmap in
> > > > > > which case the mutations are atomic operations. Or there is a
> > > > > > exclusive reference &mut bitmap, in which case there is no concurrent
> > > > > > access.
> > > > >
> > > > > It's not about shared references only. One can take a mutable
> > > > > reference, and still may have a race:
> > > > >
> > > > > CPU1                            CPU2
> > > > >
> > > > > take mut ref
> > > > > bitmap.set() // non-atomic
> > > > > put mut ref
> > > > >                                 take mut ref
> > > > >                                 bitmap.test() // read as 0
> > > > > data propagated to memory
> > > > >                                 bitmap.test() // read as 1
> > > > >
> > > > > To make this scenario impossible, either put or take mut ref
> > > > > should imply global cache flush, because bitmap array is not
> > > > > an internal data for the Bitmap class (only the pointer is).
> > > > >
> > > > > I already asked you to point me to the specification that states that
> > > > > taking mutable reference implies flushing all the caches to the point
> > > > > of coherency, but you didn't share it. And I doubt that compiler does
> > > > > it, for the performance considerations.
> > > > 
> > > > The flushing of caches and so on *is* implied. It doesn't happen every
> > > > time you take a mutable reference, but for you to be able to take a
> > > > mut ref on CPU2 after releasing it on CPU1, there must be a flush
> > > > somewhere in between.
> > > > 
> > > 
> > > Yeah, and it's not just "flushing of caches", it's making CPU1's memory
> > > operations on the object pointed by "mut ref" observable to CPU2. If
> > > CPU1 and CPU2 sync with the a lock, then lock guarantees that, and if
> > > CPU1 and CPU2 sync with a store-release+load-acquire, the
> > > RELEASE-ACQUIRE ordering guarantees that as well.
> > 
> > Not sure what you mean. Atomic set_bit() and clear() bit are often
> > implemented in asm, and there's no acquire-release semantic.
> 
> Sorry, hit 'send' preliminary.
> 
> > > Yeah, and it's not just "flushing of caches", it's making CPU1's memory
> > > operations on the object pointed by "mut ref" observable to CPU2. If
> > > CPU1 and CPU2 sync with the a lock, then lock guarantees that, 
> 
> The problem here is that the object pointed by the 'mut ref' is the
> rust class Bitmap. The class itself allocates an array, which is used
> as an actual storage. The Rust class and C array will likely not share
> cache lines.
> 
> The pointer is returned from a C call bitmap_zalloc(), so I don't
> think it's possible for Rust compiler to realize that the number
> stored in Bitmap is a pointer to data of certain size, and that it
> should be flushed at "mut ref" put... That's why I guessed a global
> flush.
> 

You don't do the flush in the C code either, right? You would rely on
some existing synchronization between threads to make sure CPU2 observes
the memory effect of CPU1 (if that's what you want).

> Yeah, would be great to understand how this all works.
> 
> As a side question: in regular C spinlocks, can you point me to the
> place where the caches get flushed when a lock moves from CPU1 to
> CPU2? I spent some time looking at the code, but found nothing myself.
> Or this implemented in a different way?

Oh I see, the simple answer would be "the fact that cache flushing is
done is implied", now let's take a simple example:

	CPU 1			CPU 2
	=====			=====
	spin_lock();
	x = 1;
	spin_unlock();

				spin_lock();
				r1 = x;		// r1 == 1
				spin_unlock();

that is, if CPU 2 gets the lock later than CPU 1, r1 is guaranteed to be
1, right? Now let's open the box, with a trivial spinlock implementation:

	CPU 1			CPU 2
	=====			=====
	spin_lock();
	x = 1;
	spin_unlock():
	  smp_store_release(lock, 0);

				spin_lock():
				  while (cmpxchg_acquire(lock, 0, 1) != 0) { }
				  
				r1 = x;		// r1 == 1
				spin_unlock();

now, for CPU2 to acquire the lock, the cmpxchg_acquire() has to succeed,
that means a few things:

1. 	CPU2 observes the lock value to be 0, i.e CPU2 observes the
	store of CPU1 on the lock.

2.	Since the smp_store_release() on CPU1, and the cmpxchg_acquire()
	on CPU2, it's guaranteed that CPU2 has observed the memory
	effect before the smp_store_release() on CPU1. And this is the
	"implied" part. In the real hardware cache protocal, what the
	smp_store_release() does is basically "flush/invalidate the
	cache and issue the store", therefore since CPU2 observes the
	store part of the smp_store_release(), it's implied that the
	cache flush/invalidate is observed by CPU2 already. Of course
	the actual hardware cache protocal is more complicated, but this
	is the gist of it.

Based on 1 & 2, normally a programer won't need to reason about where
the cache flush is actually issued, but rather the synchronization built
vi the shared variables (in this case, it's the "lock").

Hope this could help.

Regards,
Boqun


> 
> Thanks,
> Yury