linux-kernel - Re: [PATCH RFC tip/core/rcu] Avoid resched_cpu() when rescheduling the current CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180730145933.GX24813@linux.vnet.ibm.com>
Date:   Mon, 30 Jul 2018 07:59:33 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC tip/core/rcu] Avoid resched_cpu() when rescheduling
 the current CPU

On Mon, Jul 30, 2018 at 11:25:13AM +0200, Peter Zijlstra wrote:
> On Fri, Jul 27, 2018 at 08:49:31AM -0700, Paul E. McKenney wrote:
> > Hello, Peter,
> > 
> > It occurred to me that it is wasteful to let resched_cpu() acquire
> > ->pi_lock when doing something like resched_cpu(smp_processor_id()),
> 
> rq->lock

Good catch, will fix.  And thank you for looking this over!

> > and that it would be better to instead use set_tsk_need_resched(current)
> > and set_preempt_need_resched().
> > 
> > But is doing so really worthwhile?  For that matter, are there some
> > constraints on the use of those two functions that I am failing to
> > allow for in the patch below?
> 
> 
> >     The resched_cpu() interface is quite handy, but it does acquire the
> >     specified CPU's runqueue lock, which does not come for free.  This
> >     commit therefore substitutes the following when directing resched_cpu()
> >     at the current CPU:
> >     
> >             set_tsk_need_resched(current);
> >             set_preempt_need_resched();
> 
> That is only a valid substitute for resched_cpu(smp_processor_id()).

Understood.

> But also note how this can cause more context switches over
> resched_curr() for not checking if TIF_NEED_RESCHED wasn't already set.
> 
> Something that might be more in line with
> resched_curr(smp_processor_id()) would be:
> 
> 	preempt_disable();
> 	if (!test_tsk_need_resched(current)) {
> 		set_tsk_need_resched(current);
> 		set_preempt_need_resched();
> 	}
> 	preempt_enable();
> 
> Where the preempt_enable() could of course instantly trigger the
> reschedule if it was the outer most one.

Ah.  So should I use resched_curr() from rcu_check_callbacks(), which
is invoked from the scheduling-clock interrupt?  Right now I have calls
to set_tsk_need_resched() and set_preempt_need_resched().

> > @@ -2674,10 +2675,12 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
> 
> > -		resched_cpu(rdp->cpu); /* Provoke future context switch. */
> 
> > +		set_tsk_need_resched(current);
> > +		set_preempt_need_resched();
> 
> That's not obviously correct. rdp->cpu had better be smp_processor_id().

At the beginning of the function, we have:

	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);

And this is in a softirq handler, so we are OK.

> > @@ -672,7 +672,8 @@ static void sync_rcu_exp_handler(void *unused)
> >  			rcu_report_exp_rdp(rdp);
> >  		} else {
> >  			rdp->deferred_qs = true;
> > -			resched_cpu(rdp->cpu);
> > +			set_tsk_need_resched(t);
> > +			set_preempt_need_resched();
> 
> That only works if @t == current.

At the beginning of the function, we have:

	struct task_struct *t = current;

So we should be OK.

> >  		}
> >  		return;
> >  	}
> 
> > -	else
> > -		resched_cpu(rdp->cpu);
> > +	} else {
> > +		set_tsk_need_resched(t);
> > +		set_preempt_need_resched();
> 
> Similar...

Same function, so we should be good here as well.

> >  }
> 
> > @@ -791,8 +791,10 @@ static void rcu_flavor_check_callbacks(int user)
> >  	if (t->rcu_read_lock_nesting > 0 ||
> >  	    (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
> >  		/* No QS, force context switch if deferred. */
> > -		if (rcu_preempt_need_deferred_qs(t))
> > -			resched_cpu(smp_processor_id());
> > +		if (rcu_preempt_need_deferred_qs(t)) {
> > +			set_tsk_need_resched(t);
> > +			set_preempt_need_resched();
> > +		}
> 
> And another dodgy one..

And the beginning of this function also has:

	struct task_struct *t = current;

So good there as well.

Should I be instead using resched_curr() on some or all of these?

kernel/rcu/tiny.c rcu_check_callbacks():

	Interrupts disabled (scheduling clock interrupt), so no
	point in preempt_disable().  It would make sense to check
	test_tsk_need_resched().  This is handling the case where someone
	disabled something over rcu_read_unlock(), but got preempted
	within (or had an overly long) RCU read-side critical section.
	This used to result in deadlock, but now just messes up real-time
	response.

kernel/rcu/tree.c print_cpu_stall():

	Interrupts disabled, so no point in preempt_disable().
	It might make sense to check test_tsk_need_resched(), but
	on the other hand at this point this CPU has gone for
	tens of seconds without a quiescent state.  Wouldn't hurt
	to check, though.

kernel/rcu/tree.c rcu_check_callbacks():

	Interrupts disabled (scheduling clock interrupt), so no
	point in preempt_disable().  It would make sense to check
	test_tsk_need_resched().  This is handling the case where someone
	disabled something over rcu_read_unlock(), but got preempted
	within (or had an overly long) RCU read-side critical section.
	This used to result in deadlock, but now just messes up real-time
	response.

kernel/rcu/tree.c rcu_process_callbacks():

	Softirqs disabled (softirq handler), so no point
	in preempt_disable().  It might make sense to check
	test_tsk_need_resched().  This is handling the case where someone
	disabled something over rcu_read_unlock(), but got preempted
	within (or had an overly long) RCU read-side critical section.
	This used to result in deadlock, but now just messes up real-time
	response.

kernel/rcu/tree_exp.h sync_rcu_exp_handler():
kernel/rcu/tree_exp.h sync_sched_exp_handler():

	Interrupts disabled (IPI handler), so no point in
	preempt_disable().  It might make sense to check
	test_tsk_need_resched().  This is the expedited
	grace-period case.  (The first is PREEMPT, the second
	!PREEMPT.)

kernel/rcu/tree_plugin.h rcu_flavor_check_callbacks():

	Interrupts disabled (scheduling clock interrupt), so no
	point in preempt_disable().  It would make sense to check
	test_tsk_need_resched().  This is handling the case where someone
	disabled something over rcu_read_unlock(), but got preempted
	within (or had an overly long) RCU read-side critical section.
	This used to result in deadlock, but now just messes up real-time
	response.

So it looks safe for me to invoke resched_curr() in all cases.  I don't
believe that the extra nested preempt_disable() will be a performance
problem.  Anything that I am missing here?

							Thanx, Paul