lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1292610297.2266.334.camel@twins>
Date:	Fri, 17 Dec 2010 19:24:57 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Chris Mason <chris.mason@...cle.com>,
	Frank Rowand <frank.rowand@...sony.com>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
	Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 5/5] sched: Reduce ttwu rq->lock contention

On Fri, 2010-12-17 at 18:50 +0100, Oleg Nesterov wrote:
> On 12/17, Oleg Nesterov wrote:
> >
> > On 12/16, Peter Zijlstra wrote:
> > >
> > > +	if (p->se.on_rq && ttwu_force(p, state, wake_flags))
> > > +		return 1;
> >
> > 	----- WINDOW -----
> >
> > > +	for (;;) {
> > > +		unsigned int task_state = p->state;
> > > +
> > > +		if (!(task_state & state))
> > > +			goto out;
> > > +
> > > +		load = task_contributes_to_load(p);
> > > +
> > > +		if (cmpxchg(&p->state, task_state, TASK_WAKING) == task_state)
> > > +			break;
> >
> > Suppose that we have a task T sleeping in TASK_INTERRUPTIBLE state,
> > and this cpu does try_to_wake_up(TASK_INTERRUPTIBLE). on_rq == false.
> > try_to_wake_up() starts the "for (;;)" loop.
> >
> > However, in the WINDOW above, it is possible that somebody else wakes
> > it up, and then this task changes its state to TASK_INTERRUPTIBLE again.
> >
> > Then we set ->state = TASK_WAKING, but this (still running) T restores
> > TASK_RUNNING after us.
> 
> Even simpler. This can race with, say, __migrate_task() which does
> deactivate_task + activate_task and temporary clears on_rq. Although
> this is simple to fix, I think.

Yes, another hole..

> Also. Afaics, without rq->lock, we can't trust "while (p->oncpu)", at
> least we need rmb() after that.

I think Linus once argued that loops like that should be fine without a
rmb(), at worst they'll have to spin a few more times to observe the
1->0 switch (we don't care about the 0->1 switch in this case because
that's ruled out by the ->state test).

> Interestingly, I can't really understand the current meaning of smp_wmb()
> in finish_lock_switch(). Do you know what exactly is buys? 

I _think_ its meant to ensure the full contest switch happened and we've
stored all changes to the rq structure (destroying all references to
prev), in particular, we've finished writing the new value of current.

> In any case,
> task_running() (or its callers) do not have the corresponding rmb().
> Say, currently try_to_wake_up()->task_waking() can miss all changes
> starting from prepare_lock_switch(). Hopefully this is OK, but I am
> confused ;)

So I thought I saw how we are OK there, but then I got myself confused
too :-)

My argument was something along the lines of there must be some
serialization between the task going to sleep and another task waking it
(the task setting TASK_UNINTERRUPTIBLE and enqueuing it on a waitqueue,
and the waker finding it on the waitqueue), this should be sufficient to
make ->state visible to the waker.

If the waker observes a !TASK_RUNNING ->state, then by definition it
must see all the changes previous to it (including the ->oncpu 0->1
transition).

But like said, got my brain in a twist too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ