lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Aug 2022 09:43:29 -0700
From:   Boqun Feng <boqun.feng@...il.com>
To:     Hector Martin <marcan@...can.st>
Cc:     Will Deacon <will@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Tejun Heo <tj@...nel.org>, peterz@...radead.org,
        jirislaby@...nel.org, maz@...nel.org, mark.rutland@....com,
        catalin.marinas@....com, oneukum@...e.com,
        roman.penyaev@...fitbricks.com, asahi@...ts.linux.dev,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()

On Wed, Aug 17, 2022 at 01:22:09AM +0900, Hector Martin wrote:
> On 16/08/2022 23.55, Boqun Feng wrote:
> > On Tue, Aug 16, 2022 at 02:41:57PM +0100, Will Deacon wrote:
> >> It's worth noting that with the spinlock-based implementation (i.e.
> >> prior to e986a0d6cb36) then we would have the same problem on
> >> architectures that implement spinlocks with acquire/release semantics;
> >> accesses from outside of the critical section can drift in and reorder
> >> with each other there, so the conversion looked legitimate to me in
> >> isolation and I vaguely remember going through callers looking for
> >> potential issues. Alas, I obviously missed this case.
> >>
> > 
> > I just to want to mention that although spinlock-based atomic bitops
> > don't provide the full barrier in test_and_set_bit(), but they don't
> > have the problem spotted by Hector, because test_and_set_bit() and
> > clear_bit() sync with each other via locks:
> > 
> > 	test_and_set_bit():
> > 	  lock(..);
> > 	  old = *p; // mask is already set by other test_and_set_bit()
> > 	  *p = old | mask;
> > 	  unlock(...);
> > 				clear_bit():
> > 				  lock(..);
> > 				  *p ~= mask;
> > 				  unlock(..);
> > 
> > so "having a full barrier before test_and_set_bit()" may not be the
> > exact thing we need here, as long as a test_and_set_bit() can sync with
> > a clear_bit() uncondiontally, then the world is safe. For example, we
> > can make test_and_set_bit() RELEASE, and clear_bit() ACQUIRE on ARM64:
> > 
> > 	test_and_set_bit():
> > 	  atomic_long_fetch_or_release(..); // pair with clear_bit()
> > 	  				    // guarantee everything is
> > 					    // observed.
> > 	  			clear_bit():
> > 				  atomic_long_fetch_andnot_acquire(..);
> > 	  
> > , maybe that's somewhat cheaper than a full barrier implementation.
> > 
> > Thoughts? Just to find the exact ordering requirement for bitops.
> 
> It's worth pointing out that the workqueue code does *not* pair
> test_and_set_bit() with clear_bit(). It does an atomic_long_set()
> instead (and then there are explicit barriers around it, which are
> expected to pair with the implicit barrier in test_and_set_bit()). If we
> define test_and_set_bit() to only sync with clear_bit() and not
> necessarily be a true barrier, that breaks the usage of the workqueue code.
> 

Ah, I miss that, but that means the old spinlock-based atomics are
totally broken unless spinlock means full barriers on these archs.

But still, if we define test_and_set_bit() as RELEASE atomic instead of 
a full barrier + atomic, it should work for workqueue, right? Do we
actually need extra ordering here?

	WRITE_ONCE(*x, 1); // A
	test_and_set_bit(..); // a full barrier will order A & B
	WRITE_ONCE(*y, 1); // B

That's something I want to figure out.

Regards,
Boqun
> - Hector

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ