linux-kernel - Re: [patchlet] Re: Scheduler bug related to rq->skip_clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1291748109.7475.20.camel@marge.simson.net>
Date:	Tue, 07 Dec 2010 19:55:09 +0100
From:	Mike Galbraith <efault@....de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Yong Zhang <yong.zhang0@...il.com>,
	"Bjoern B. Brandenburg" <bbb.lst@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Andrea Bastoni <bastoni@...g.uniroma2.it>,
	"James H. Anderson" <anderson@...unc.edu>,
	linux-kernel@...r.kernel.org
Subject: Re: [patchlet] Re: Scheduler bug related to rq->skip_clock_update?

On Tue, 2010-12-07 at 17:41 +0100, Peter Zijlstra wrote:
> On Mon, 2010-12-06 at 09:32 +0100, Mike Galbraith wrote:
> 
> >  kernel/fork.c  |    1 +
> >  kernel/sched.c |    6 +++---
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > Index: linux-2.6.37.git/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.37.git.orig/kernel/sched.c
> > +++ linux-2.6.37.git/kernel/sched.c
> > @@ -660,6 +660,7 @@ inline void update_rq_clock(struct rq *r
> >  
> >  		sched_irq_time_avg_update(rq, irq_time);
> >  	}
> > +	rq->skip_clock_update = 0;
> >  }
> >  
> >  /*
> 
> Shouldn't we do that at the end of schedule()? Since the purpose of
> ->skip_clock_update is to avoid multiple calls to:
>   - avoid overhead
>   - ensure scheduling is accounted at a single point
> 
> [ for that latter purpose it might also make sense to put that point
> somewhere around context_switch() but due to the fact that we need a
> clock update early that's a bit impractical. ]
> 
> Hmm?

Yeah, could do that instead.  There's no gain in any call that may
happen in the interval between.  Think I'll measure though, this bug was
a surprise :)

> > @@ -2138,7 +2139,7 @@ static void check_preempt_curr(struct rq
> >  	 * A queue event has occurred, and we're going to schedule.  In
> >  	 * this case, we can save a useless back to back clock update.
> >  	 */
> > -	if (test_tsk_need_resched(rq->curr))
> > +	if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
> >  		rq->skip_clock_update = 1;
> >  }
> 
> OK, I initially tried to replace the test with a return value of
> ->check_preempt_curr() and such, but that turns into a lot of code and
> won't necessarily be any better.

(Yeah, I considered doing the same)

> > @@ -3854,7 +3855,6 @@ static void put_prev_task(struct rq *rq,
> >  {
> >  	if (prev->se.on_rq)
> >  		update_rq_clock(rq);
> > -	rq->skip_clock_update = 0;
> >  	prev->sched_class->put_prev_task(rq, prev);
> >  }
> 
> See the first note.
> 
> > @@ -3912,7 +3912,6 @@ need_resched_nonpreemptible:
> >  		hrtick_clear(rq);
> >  
> >  	raw_spin_lock_irq(&rq->lock);
> > -	clear_tsk_need_resched(prev);
> >  
> >  	switch_count = &prev->nivcsw;
> >  	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> > @@ -3942,6 +3941,7 @@ need_resched_nonpreemptible:
> >  	if (unlikely(!rq->nr_running))
> >  		idle_balance(cpu, rq);
> >  
> > +	clear_tsk_need_resched(prev);
> >  	put_prev_task(rq, prev);
> >  	next = pick_next_task(rq);
> 
> Good find, this needs to be done after the idle balancing because that
> can release the rq->lock and allow for TIF_NEED_RESCHED to be set again.
> 
> Maybe complement this with a WARN_ON_ONCE(test_tsk_need_resched(next))
> somewhere after pick_next_task() so as to ensure that !current has !
> TIF_NEED_RESCHED.
> 
> > Index: linux-2.6.37.git/kernel/fork.c
> > ===================================================================
> > --- linux-2.6.37.git.orig/kernel/fork.c
> > +++ linux-2.6.37.git/kernel/fork.c
> > @@ -275,6 +275,7 @@ static struct task_struct *dup_task_stru
> >  
> >  	setup_thread_stack(tsk, orig);
> >  	clear_user_return_notifier(tsk);
> > +	clear_tsk_need_resched(tsk);
> >  	stackend = end_of_stack(tsk);
> >  	*stackend = STACK_END_MAGIC;	/* for overflow detection */
> >  
> 
> OK.. have we looked if there's more TIF flags that could do with a
> reset?

mmm, no.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/