lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjfLT7nL8pV8RWATpjgm0zDtUwT8UMtroqnGcXRjN8tgw@mail.gmail.com>
Date:   Mon, 15 Aug 2022 22:52:49 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Hector Martin <marcan@...can.st>
Cc:     Herbert Xu <herbert@...dor.apana.org.au>,
        Will Deacon <will@...nel.org>, Tejun Heo <tj@...nel.org>,
        peterz@...radead.org, jirislaby@...nel.org, maz@...nel.org,
        mark.rutland@....com, boqun.feng@...il.com,
        catalin.marinas@....com, oneukum@...e.com,
        roman.penyaev@...fitbricks.com, asahi@...ts.linux.dev,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()

On Mon, Aug 15, 2022 at 10:36 PM Hector Martin <marcan@...can.st> wrote:
>
> These ops are documented in Documentation/atomic_bitops.txt as being
> unordered in the failure ("bit was already set" case), and that matches
> the generic implementation (which arm64 uses).

Yeah, that documentation is pure garbage. We need to fix it.

I think that "unordered on failure" was added at the same time that
the generic implementation was rewritten.

IOW, the documentation was changed to match that broken
implementation, but it's clearly completely broken.

I think I understand *why* it's broken - it looks like a "harmless"
optimization. After all, if the bitop doesn't do anything, there's
nothing to order it with.

It makes a certain amount of sense - as long as you don't think about
it too hard.

The reason it is completely and utterly broken is that it's not
actually just "the bitop doesn't do anything". Even when it doesn't
change the bit value, just the ordering of the read of the old bit
value can be meaningful, exactly for that case of "I added more work
to the queue, I need to set the bit to tell the consumers, and if I'm
the first person to set the bit I may need to wake the consumer up".


> On the other hand, Twitter just pointed out that contradicting
> documentation exists (I believe this was the source of the third party
> doc I found that claimed it's always a barrier):

It's not just that other documentation exists - it's literally that
the unordered semantics don't even make sense, and don't match reality
and history.

And nobody thought about it or caught it at the time.

The Xen people seem to have noticed at some point, and tried to
introduce a "sync_set_set()"

> So either one doc and the implementation are wrong, or the other doc is
> wrong.

That doc and the generic implementation is clearly wrong.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ