[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31R+ohAB3w+wTWj08LfM9ePP8tfyW-Vie5Uef-RwCu-b4sw@mail.gmail.com>
Date: Fri, 11 Dec 2015 03:30:33 -0800
From: Paul Turner <pjt@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: NeilBrown <nfbrown@...ell.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...nel.org>,
Peter Anvin <hpa@...or.com>, vladimir.murzin@....com,
linux-tip-commits@...r.kernel.org, jstancek@...hat.com,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers
On Thu, Dec 10, 2015 at 5:09 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Dec 10, 2015 at 08:30:01AM +1100, NeilBrown wrote:
>> On Wed, Dec 09 2015, Peter Zijlstra wrote:
>>
>> > On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote:
>> >> On Tue, Dec 08 2015, Peter Zijlstra wrote:
>> >>
>> >> >>
>> >> >
>> >> > *sigh*, so that patch was broken.. the below might fix it, but please
>> >> > someone look at it, I seem to have a less than stellar track record
>> >> > here...
>> >>
>> >> This new change seems to be more intrusive than should be needed.
>> >> Can't we just do:
>> >>
>> >>
>> >> __sched int bit_wait(struct wait_bit_key *word)
>> >> {
>> >> + long state = current->state;
>> >
>> > No, current->state can already be changed by this time.
>>
>> Does that matter?
>> It can only have changed to TASK_RUNNING - right?
>> In that case signal_pending_state() will return 0 and the bit_wait() acts
>> as though the thread was woken up normally (which it was) rather than by
>> a signal (which maybe it was too, but maybe that happened just a tiny
>> bit later).
>>
>> As long as signal delivery doesn't change ->state, we should be safe.
>> We should even be safe testing ->state *after* the call the schedule().
>
> Blergh, all I've managed to far is to confuse myself further. Even
> something like the original (+- the EINTR) should work when we consider
> the looping, even when mixed with an occasional spurious wakeup.
>
>
> int bit_wait()
> {
> if (signal_pending_state(current->state, current))
> return -EINTR;
> schedule();
> }
>
>
> This can go wrong against raising a signal thusly:
>
> prepare_to_wait()
> 1: if (signal_pending_state(current->state, current))
> // false, nothing pending
> schedule();
> set_tsk_thread_flag(t, TIF_SIGPENDING);
>
> <spurious wakeup>
>
> prepare_to_wait()
> wake_up_state(t, ...);
> 2: if (signal_pending_state(current->state, current))
> // false, TASK_RUNNING
>
> schedule(); // doesn't block because pending
Note that a quick inspection does not turn up _any_ TASK_INTERRUPTIBLE
callers. When this previously occurred, it could likely only be with
a fatal signal, which would have hidden these sins.
>
> prepare_to_wait()
> 3: if (signal_pending_state(current->state, current))
> // true, pending
>
Hugh asked me about this after seeing a crash, here's another exciting
way in which the current code breaks -- this one actually quite
serious:
Consider __lock_page:
void __lock_page(struct page *page)
{
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
TASK_UNINTERRUPTIBLE);
}
With the current state of the world,
__sched int bit_wait_io(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
io_schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
Called from __wait_on_bit_lock.
Previously, signal_pending_state() was checked under
TASK_UNINTERRUPTIBLE (via prepare_to_wait_exclusive). Now, we simply
check for the presence of any signal -- after we have returned to
running state, e.g. post io_schedule() when somebody has kicked the
wait-queue.
However, this now means that _wait_on_bit_lock can return -EINTR up to
__lock_page; which does not validate the return code and blindly
returns. This looks to have been a previously existing bug, but it
was at least masked by the fact that it required a fatal signal
previously (and that the page we return unlocked is likely going to be
freed from the dying process anyway).
Peter's proposed follow-up above looks strictly more correct. We need
to evaluate the potential existence of a signal, *after* we return
from schedule, but in the context of the state which we previously
_entered_ schedule() on.
Reviewed-by: Paul Turner <pjt@...gle.com>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists