[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1593505946.t0nxq8q8kj.astroid@bobo.none>
Date: Tue, 30 Jun 2020 19:08:10 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andi Kleen <ak@...ux.intel.com>,
Davidlohr Bueso <dave@...olabs.net>, Jan Kara <jack@...e.cz>,
Lukas Czerner <lczerner@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss
wakeup?
Excerpts from Oleg Nesterov's message of June 30, 2020 4:17 pm:
> On 06/30, Nicholas Piggin wrote:
>> Excerpts from Oleg Nesterov's message of June 30, 2020 12:02 am:
>> > On 06/29, Nicholas Piggin wrote:
>> >>
>> >> prepare_to_wait_event() has a pretty good pattern (and comment), I would
>> >> favour using that (test the signal when inserting on the waitqueue).
>> >>
>> >> @@ -1133,6 +1133,15 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
>> >> for (;;) {
>> >> spin_lock_irq(&q->lock);
>> >>
>> >> + if (signal_pending_state(state, current)) {
>> >> + /* Must not lose an exclusive wake up, see
>> >> + * prepare_to_wait_event comment */
>> >> + list_del_init(&wait->entry);
>> >> + spin_unlock_irq(&q->lock);
>> >> + ret = -EINTR;
>> >
>> > Basically this is what my patch in the 1st email does. But note that we can't
>> > just set "ret = -EINTR" here, we will need to clear "ret" if test_and_set_bit()
>> > below succeeds. That is why I used another "int intr" variable.
>>
>> You snipped off one more important line of context. No such games are
>> required AFAIKS.
>
> for (;;) {
> spin_lock_irq(&q->lock);
>
> + if (signal_pending_state(state, current)) {
> + /* Must not lose an exclusive wake up, see
> + * prepare_to_wait_event comment */
> + list_del_init(&wait->entry);
> + spin_unlock_irq(&q->lock);
> + ret = -EINTR;
> + break;
> + }
>
>
> so wait_on_page_bit_common() just returns -EINTR if signal_pending_state() == T.
> And this is wrong if "current" was already woken up by unlock_page().
>
> That is why ___wait_event() checks the condition even if prepare_to_wait_event()
> returns -EINTR. The comment in prepare_to_wait_event() tries to explain this.
Hmm, yeah because we can loop around here with task in task sleeping
state. Which comes back to Linus' fix. Thanks.
It looks like I broke this with 62906027091f1, then Linus mostly fixed
it in a8b169afbf06a. My patch is what actually introduced this ugly
bit test, but do we even need it at all? If we do then it's
under-commented, I can't see it wouldn't be racy though. Can we just
get rid of it entirely?
Thanks,
Nick
Powered by blists - more mailing lists