lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 8 Aug 2013 12:51:37 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Long Gao <gaolong@...inos.com.cn>,
	Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Patch for lost wakeups

On Thu, Aug 8, 2013 at 12:17 PM, Oleg Nesterov <oleg@...hat.com> wrote:
>
>>  - somebody setting TASK_SLEEPING -> __schedule() testing the
>> signal_pending_state()
>>
>> and as far as I can tell we have proper barriers for those (the
>> scheduler gets the rq lock
>
> Yes, but... ttwu() takse another lock, ->pi_lock to test ->state.

The lock is different, but for task_state, the main thing we need to
worry abotu is memory ordering, not locks. Because "task->state" is
not protected by any particular lock, it has some rather special rules
(namely that the thread is supposed to modify its own state, *except*
for waking things up, when we rely on memory ordering).

So the real issue is to make sure that before finally going to sleep,
any other CPU that races with waking things up needs to have the right
ordering. And that means that

 - the sleeper needs to have the proper barrier between setting the
task state to the sleeping state and testing the waking condition

   Normally this is the "smp_mb()" in set_task_state(), and then the
sleeper is supposed to check its wakeup criteria *after* doing
set_task_state().

   The case of signals is special, in that the "wakeup criteria" is
inside the scheduler itself, but conceptually the rule is the same.

 - the waker needs to have a memory barrier between setting the wakeup
condition and actually doing the wakeup.

   This is where the "try_to_wake_up()" smp_mb+spinlock *should* be
equivalent to a full memory barrier.

> This looks racy, even if wmb() actually acts as mb(), we don't
> have mb() on the other side and schedule() can miss SIGPENDING?

But we do have the mb, at least on x86. The "set_tsk_thread_flag()" is
a memory barrier there. But that's why I suggested adding a
smp_mb__after_clear_bit() to after setting the bit, in case that's the
problem on LoongSon. Because at least MIPS uses ll/sc for atomic, and
needs a sync instruction if the memory model is weak afterwards.

That said, even on MIPS I do not actually believe it should be
necessary, exactly *because* __schedule() already has a spinlock
before checking the signal state. And the spinlock already contains
sufficient serialization (much *more* than the required sync
instruction) that in practice we're already covered.

So there might be a theoretical race on some architecture, but not
afaik mips. I don't know the details about loongson, though.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ