linux-kernel - Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjJKicPgmvf7atx=h6Li7ez9nFrpJdgTSRm5aUEnuVH6w@mail.gmail.com>
Date:   Mon, 29 Jun 2020 09:36:32 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>, Jan Kara <jack@...e.cz>,
        Lukas Czerner <lczerner@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?

On Mon, Jun 29, 2020 at 6:16 AM Nicholas Piggin <npiggin@...il.com> wrote:
>
> No, ignore this part (which you explained well it was just a thinko,
> and your patch of course would not have worked if this was the case):
> the exclusive wake up doesn't get lost if schedule() was called because
> state goes back to running regardless of what woke it.

Right.

The normal pattern for a wait-loop is

 - add yourself to the wait-queue and set yourself "sleeping" with a
memory barrier.

 - test for the condition, exit if ok

 - go to sleep

and that pattern doesn't have the race.

The other common pattern is to check for the "do I need to sleep at
all" at the *top* of the function, long before you bother with any
wait-queues at all. This code does that odd "let's check in the middle
if we need to sleep" instead, which then caused the bug.

So we had an odd setup because of three different wait conditions that
had different rules for what they could/should do before sleeping, and
then not sleeping reliably at all.

We could also fix it by just splitting out the three cases into their
own wait routines that match the normal pattern. The bug really
happened because that wait-loop is doing things such an odd way due to
all the different cases..

I actually think it would be a lot more readable if it was three
different cases instead of trying to be one "common" routine.

The *common* parts is the special PG_locked logic at the top, and the
thrashing/delayacct code at the bottom.

And *that* could be a true common helper, but the wait loop (which
isn't even a loop for the DROP case) are fundamentally different and
probably should be separate functions.

So I think my "one-liner" fixes the problem, but I'd certainly be open
to somebody cleaning this up properly.

Anybody?

             Linus