[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <74559da4-5cd4-7cc4-0303-ab5f6a8b92ae@marcan.st>
Date: Wed, 17 Aug 2022 01:22:09 +0900
From: Hector Martin <marcan@...can.st>
To: Boqun Feng <boqun.feng@...il.com>, Will Deacon <will@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Herbert Xu <herbert@...dor.apana.org.au>,
Tejun Heo <tj@...nel.org>, peterz@...radead.org,
jirislaby@...nel.org, maz@...nel.org, mark.rutland@....com,
catalin.marinas@....com, oneukum@...e.com,
roman.penyaev@...fitbricks.com, asahi@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()
On 16/08/2022 23.55, Boqun Feng wrote:
> On Tue, Aug 16, 2022 at 02:41:57PM +0100, Will Deacon wrote:
>> It's worth noting that with the spinlock-based implementation (i.e.
>> prior to e986a0d6cb36) then we would have the same problem on
>> architectures that implement spinlocks with acquire/release semantics;
>> accesses from outside of the critical section can drift in and reorder
>> with each other there, so the conversion looked legitimate to me in
>> isolation and I vaguely remember going through callers looking for
>> potential issues. Alas, I obviously missed this case.
>>
>
> I just to want to mention that although spinlock-based atomic bitops
> don't provide the full barrier in test_and_set_bit(), but they don't
> have the problem spotted by Hector, because test_and_set_bit() and
> clear_bit() sync with each other via locks:
>
> test_and_set_bit():
> lock(..);
> old = *p; // mask is already set by other test_and_set_bit()
> *p = old | mask;
> unlock(...);
> clear_bit():
> lock(..);
> *p ~= mask;
> unlock(..);
>
> so "having a full barrier before test_and_set_bit()" may not be the
> exact thing we need here, as long as a test_and_set_bit() can sync with
> a clear_bit() uncondiontally, then the world is safe. For example, we
> can make test_and_set_bit() RELEASE, and clear_bit() ACQUIRE on ARM64:
>
> test_and_set_bit():
> atomic_long_fetch_or_release(..); // pair with clear_bit()
> // guarantee everything is
> // observed.
> clear_bit():
> atomic_long_fetch_andnot_acquire(..);
>
> , maybe that's somewhat cheaper than a full barrier implementation.
>
> Thoughts? Just to find the exact ordering requirement for bitops.
It's worth pointing out that the workqueue code does *not* pair
test_and_set_bit() with clear_bit(). It does an atomic_long_set()
instead (and then there are explicit barriers around it, which are
expected to pair with the implicit barrier in test_and_set_bit()). If we
define test_and_set_bit() to only sync with clear_bit() and not
necessarily be a true barrier, that breaks the usage of the workqueue code.
- Hector
Powered by blists - more mailing lists