lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130809130457.GA27493@redhat.com>
Date:	Fri, 9 Aug 2013 15:04:57 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Long Gao <gaolong@...inos.com.cn>,
	Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Patch for lost wakeups

On 08/08, Linus Torvalds wrote:
>
> On Thu, Aug 8, 2013 at 12:17 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> >> and as far as I can tell we have proper barriers for those (the
> >> scheduler gets the rq lock
> >
> > Yes, but... ttwu() takse another lock, ->pi_lock to test ->state.
>
> The lock is different, but for task_state, the main thing we need to
> worry abotu is memory ordering, not locks.

Yes sure. However, afaics in this particular case the locking does
matter.

Because:

>    The case of signals is special, in that the "wakeup criteria" is
> inside the scheduler itself, but conceptually the rule is the same.

yes, and because the waiter lacks mb().

IOW. The code like

	__set_current_state(STATE);
	if (!CONDITION)
		schedule();

is obviously racy, it doesn't have mb().

But the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

was always considered as correct, it relies on try_to_wake_up/schedule
interaction. But after try_to_wake_up() was changed to use task->pi_lock
this becomes racy in theory. Afaics.

This __set_current_state(TASK_INTERRUPTIBLE) can leak into the critical
section protected by rq->lock, it can be reordered with the CONDITION
check, and in this case CONDITION == signal_pending().

No?

> > we don't
> > have mb() on the other side and schedule() can miss SIGPENDING?
>
> But we do have the mb, at least on x86. The "set_tsk_thread_flag()" is
> a memory barrier there.

Sorry for confusion, I meant "other side", see above.

> But that's why I suggested adding a
> smp_mb__after_clear_bit() to after setting the bit,

Agreed. Or, once again, we can change try_to_wake_up() to do mb()
rather then wmb().

And compared to the theoretical race above this looks more likely
to me (although still unlikely).

But probably we should start with another debugging patch, I'll send
it in a minute.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ