linux-kernel - Re: Debugging a TTY race condition on M1 (memory ordering dragons)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f3f84fc1-50be-0a9b-0538-6ea26dd93b16@marcan.st>
Date:   Tue, 16 Aug 2022 04:15:00 +0900
From:   Hector Martin <marcan@...can.st>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     Will Deacon <will@...nel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Greg KH <gregkh@...uxfoundation.org>, jirislaby@...nel.org,
        Marc Zyngier <maz@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Asahi Linux <asahi@...ts.linux.dev>,
        Oliver Neukum <oneukum@...e.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Debugging a TTY race condition on M1 (memory ordering dragons)

On 16/08/2022 03.58, Boqun Feng wrote:
> I agree this is handy, but an unconditional full barrier may be costy to
> some users, and probably unnecessary if the users periodically queue
> the work. In that case, some successful enqueue will eventually make all
> memory accesses observable. Also if workqueue users use their own
> locking in work function, then the barrier is also unnecessary.
> 
> The document part of course needs some help to clear things up. But I'm
> not sure "strengthen"ing the ordering guarantee of queue_work() is a
> good idea. Maybe a dedicated API, like:
> 
> // More work is needed for the @work, it has the same semantics as
> // queue_work() if the @work is not pending. If the @work is pending,
> // this ensures the work function observes all memory access before
> // this.
> void queue_more_work(struct work_struct *work)
> {
> 	smp_mb();
> 	queue_work(work);
> }
> 
> Regards,
> Boqun

FWIW, I didn't actually use a full barrier in my patch. I just replaced
the test_and_set_bit() with the underlying atomic op, sans early exit path.

Personally though, I think it makes more sense to have the default
function provide the guarantees, and if someone *really* needs the
performance gain from eliding the implicit barrier, they could use an
alternate API for that (after they show useful gains). This stuff is too
subtle to expect every caller to wrap their head around memory ordering,
and having queue_work() always provide order with prior stores *feels*
intuitive.

But let's see what the workqueue folks say :)

- Hector