linux-kernel - Re: TASK_DEAD && ttwu() again (Was: ensure arch_ptrace/ptrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20130124185026.GA18814@redhat.com>
Date:	Thu, 24 Jan 2013 19:50:26 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Yasunori Goto <y-goto@...fujitsu.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Dan Carpenter <dan.carpenter@...cle.com>,
	Michael Davidson <md@...gle.com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Julien Tinnes <jln@...gle.com>,
	Aaron Durbin <adurbin@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Roland McGrath <roland@...k.frob.com>,
	Alexander Gordeev <agordeev@...hat.com>
Subject: Re: TASK_DEAD && ttwu() again (Was: ensure
	arch_ptrace/ptrace_request can never race with SIGKILL)

Hi Tejun,

On 01/23, Tejun Heo wrote:
> Hello, Oleg.
>
> On Wed, Jan 23, 2013 at 08:19:46PM +0100, Oleg Nesterov wrote:
> >
> > And this means that a task doing
> >
> > 	current->state = STATE_1;
> > 	// no schedule() in between
> > 	current->state = STATE_2;
> > 	schedule();
> >
> > can be actually woken up by try_to_wake_up(STATE_1) even if it already
> > sleeps in STATE_2.
>
> Hmmm... nasty.
>
> ...
> > and we have the same problem again. So _I think_ that we we need another
> > mb() after unlock_wait() ?
>
> Seems so, or, maybe we should add barrier semantics to unlock_wait()?
> As it currently stands, it kinda invites misusages.

Perhaps, I am not sure. I agree, unlock_wait() without a barrier at
least on one side probably never makes sense.

> > And, afaics, in theory we can't simply move the current mb() down, after
> > unlock_wait().  (again, only in theory, if nothing else we should have
> > the implicit barrrers after we played with ->state in the past).
> >
> > Or perhaps we should modify ttwu_do_wakeup() to not blindly set RUNNING,
> > say, cmpxchg(old_state, RUNNING). But this is not simple/nice.
>
> I personally think this is the right thing to do short of requiring
> locking on current->state changes.  The situation is a bit muddy
> because we're generally requiring sleepers to loop while still having
> cases where things don't work that way.  It's a little scary that we
> require looping to protect against stray wakeups, which can be very
> rare, without any way to verify/test.
>
> The waker would be acquiring the cacheline exclusively one way or the
> other, so I don't think doing cmpxchg would add much overhead.  We
> would definitely want to do comparisons tho.

Still cmpxchg() is not free, Peter argued that ttwu() is the very hot
path and I understand his concern. And in fact this change doesn't look
very simple...

And this can not eliminate the spurious wakeups in general. Plus in
this particular case (I mean the race described by the comment above
spin_unlock_wait in do_exit) we could equally blame rw_semaphore, not
try_to_wake_up().

So I leave this to scheduler people.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/