[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130809130457.GA27493@redhat.com>
Date: Fri, 9 Aug 2013 15:04:57 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Long Gao <gaolong@...inos.com.cn>,
Al Viro <viro@...iv.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Patch for lost wakeups
On 08/08, Linus Torvalds wrote:
>
> On Thu, Aug 8, 2013 at 12:17 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> >> and as far as I can tell we have proper barriers for those (the
> >> scheduler gets the rq lock
> >
> > Yes, but... ttwu() takse another lock, ->pi_lock to test ->state.
>
> The lock is different, but for task_state, the main thing we need to
> worry abotu is memory ordering, not locks.
Yes sure. However, afaics in this particular case the locking does
matter.
Because:
> The case of signals is special, in that the "wakeup criteria" is
> inside the scheduler itself, but conceptually the rule is the same.
yes, and because the waiter lacks mb().
IOW. The code like
__set_current_state(STATE);
if (!CONDITION)
schedule();
is obviously racy, it doesn't have mb().
But the code like
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
was always considered as correct, it relies on try_to_wake_up/schedule
interaction. But after try_to_wake_up() was changed to use task->pi_lock
this becomes racy in theory. Afaics.
This __set_current_state(TASK_INTERRUPTIBLE) can leak into the critical
section protected by rq->lock, it can be reordered with the CONDITION
check, and in this case CONDITION == signal_pending().
No?
> > we don't
> > have mb() on the other side and schedule() can miss SIGPENDING?
>
> But we do have the mb, at least on x86. The "set_tsk_thread_flag()" is
> a memory barrier there.
Sorry for confusion, I meant "other side", see above.
> But that's why I suggested adding a
> smp_mb__after_clear_bit() to after setting the bit,
Agreed. Or, once again, we can change try_to_wake_up() to do mb()
rather then wmb().
And compared to the theoretical race above this looks more likely
to me (although still unlikely).
But probably we should start with another debugging patch, I'll send
it in a minute.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists