linux-kernel - Re: allow preemption in check_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140210171712.GA17517@opentech.at>
Date:	Mon, 10 Feb 2014 18:17:12 +0100
From:	Nicholas Mc Guire <der.herr@...r.at>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	linux-rt-users@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Carsten Emde <C.Emde@...dl.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andreas Platschek <platschek@....tuwien.ac.at>
Subject: Re: allow preemption in check_task_state

On Mon, 10 Feb 2014, Steven Rostedt wrote:

> Subject is missing patch number.
> 
> 
> On Mon, 10 Feb 2014 16:38:56 +0100
> Nicholas Mc Guire <der.herr@...r.at> wrote:
> 
> > 
> > A lockfree approach to check_task_state
> > 
> > This treates the state as an indicator variable and use it to probe 
> > saved_state lock free. There is actually no consistency demand on 
> > state/saved_state but rather a consistency demand on the transitions 
> > of the two variables but those transition, based on path inspection,
> > are not independent.
> > 
> > Its probably not faster than the lock/unlock case if uncontended - atleast
> > it does not show up in benchmark results, but it would never be hit by a 
> > full pi-boost cycle as there is no contention.
> > 
> > This also was tested against the test-case from Sebastian as well as 
> > rnning a few scripted gdb breakpoint debugging/single-stepping loops
> > to trigger this.
> 
> To trigger what?

sorry should have included that in the patch header
the testcase that Sebastian Andrzej Siewior had - available at:
  http://breakpoint.cc/ptrace-test.c
the test-case triggers missing the state update.

> 
> > 
> > Tested-by: Andreas Platschek <platschek@....tuwien.ac.at>
> > Tested-by: Carsten Emde <C.Emde@...dl.org>
> > Signed-off-by: Nicholas Mc Guire <der.herr@...r.at>
> > ---
> >  kernel/sched/core.c |   10 ++++++++--
> >  1 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bf93f63..5690ba3 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1074,11 +1074,17 @@ static int migration_cpu_stop(void *data);
> >  static bool check_task_state(struct task_struct *p, long match_state)
> >  {
> >  	bool match = false;
> > +	long state, saved_state;
> > +
> > +	/* catch restored state */
> > +	do {
> > +		state = p->state;
> > +		saved_state = p->saved_state;
> > +		rmb();  /* make sure we actually catch updates */
> 
> The problem I have with this is that there's no matching wmb(). Also,
> shouldn't that be a smp_rmb(), I don't think we can race with devices
> here.

Sebastian also mentioned that - I simply was not sure on this - still
not into this deep enough I guess .

> 
> > +	} while (state != p->state);
> >  
> > -	raw_spin_lock_irq(&p->pi_lock);
> >  	if (p->state == match_state || p->saved_state == match_state)
> >  		match = true;
> > -	raw_spin_unlock_irq(&p->pi_lock);
> >  
> >  	return match;
> >  }
> 
> 
> In rtmutex.c we have:
> 
> 	pi_lock(&self->pi_lock);
> 	__set_current_state(self->saved_state);
> 	self->saved_state = TASK_RUNNING;
> 	pi_unlock(&self->pi_lock);
> 
> As there is no wmb() here, it can be very possible that another CPU
> will see saved_state as TASK_RUNNING, and current state as
> TASK_RUNNING, and miss the update completely.
> 
> I would not want to add a wmb() unless there is a real bug with the
> check state, as the above is in a very fast path and the check state is
> in a slower path.
>
maybe I'm missing/missunderstanding something here but
pi_unlock -> arch_spin_unlock is a full mb() 
so once any task did an update of the state the loop should be catching
this update ? if the loop exits before the updat takes effect (pi_unlock)
would that be ncorrect ?

thx!
hofrat
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/