[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd51b422-89f3-1856-5d3b-d6e5b0029085@marcan.st>
Date: Tue, 16 Aug 2022 14:36:31 +0900
From: Hector Martin <marcan@...can.st>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Herbert Xu <herbert@...dor.apana.org.au>,
Will Deacon <will@...nel.org>
Cc: Tejun Heo <tj@...nel.org>, peterz@...radead.org,
jirislaby@...nel.org, maz@...nel.org, mark.rutland@....com,
boqun.feng@...il.com, catalin.marinas@....com, oneukum@...e.com,
roman.penyaev@...fitbricks.com, asahi@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()
On 16/08/2022 14.27, Linus Torvalds wrote:
> On Mon, Aug 15, 2022 at 9:15 PM Herbert Xu <herbert@...dor.apana.org.au> wrote:
>>
>> Please revert this as test_and_set_bit was always supposed to be
>> a full memory barrier. This is an arch bug.
>
<snip>
> From looking at it, the asm-generic implementation is a bit
> questionable, though. In particular, it does
>
> if (READ_ONCE(*p) & mask)
> return 1;
>
> so it's *entirely* unordered for the "bit was already set" case.
These ops are documented in Documentation/atomic_bitops.txt as being
unordered in the failure ("bit was already set" case), and that matches
the generic implementation (which arm64 uses).
On the other hand, Twitter just pointed out that contradicting
documentation exists (I believe this was the source of the third party
doc I found that claimed it's always a barrier):
include/asm-generic/bitops/instrumented-atomic.h
So either one doc and the implementation are wrong, or the other doc is
wrong.
> --- a/include/asm-generic/bitops/atomic.h
> +++ b/include/asm-generic/bitops/atomic.h
> @@ -39,9 +39,6 @@ arch_test_and_set_bit(
> unsigned long mask = BIT_MASK(nr);
>
> p += BIT_WORD(nr);
> - if (READ_ONCE(*p) & mask)
> - return 1;
> -
> old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p);
> return !!(old & mask);
> }
> @@ -53,9 +50,6 @@ arch_test_and_clear_bit
> unsigned long mask = BIT_MASK(nr);
>
> p += BIT_WORD(nr);
> - if (!(READ_ONCE(*p) & mask))
> - return 0;
> -
> old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
> return !!(old & mask);
> }
>
> but the above is not just whitespace-damaged, it's entirely untested
> and based purely on me looking at that code.
That does work, I in fact tried that fix first to prove that the
early-exit was the problem.
I don't have a particularly strong opinion on which way is right here, I
just saw the implementation matched the docs (that I found) and Will
commented on the other thread implying this was all deliberate.
- Hector
Powered by blists - more mailing lists