linux-kernel - Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140729163312.GR11241@linux.vnet.ibm.com>
Date:	Tue, 29 Jul 2014 09:33:12 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
	dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
	fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com
Subject: Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

On Tue, Jul 29, 2014 at 06:07:54PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 08:57:47AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 29, 2014 at 09:50:55AM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 28, 2014 at 03:56:12PM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > index bc1638b33449..a0d2f3a03566 100644
> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -2762,6 +2762,7 @@ need_resched:
> > > >  		} else {
> > > >  			deactivate_task(rq, prev, DEQUEUE_SLEEP);
> > > >  			prev->on_rq = 0;
> > > > +			rcu_note_voluntary_context_switch(prev);
> > > >  
> > > >  			/*
> > > >  			 * If a worker went to sleep, notify and ask workqueue
> > > > @@ -2828,6 +2829,7 @@ asmlinkage __visible void __sched schedule(void)
> > > >  	struct task_struct *tsk = current;
> > > >  
> > > >  	sched_submit_work(tsk);
> > > > +	rcu_note_voluntary_context_switch(tsk);
> > > >  	__schedule();
> > > >  }
> > > 
> > > Yeah, not entirely happy with that, you add two calls into one of the
> > > hotest paths of the kernel.
> > 
> > I did look into leveraging counters, but cannot remember why I decided
> > that this was a bad idea.  I guess it is time to recheck...
> > 
> > The ->nvcsw field in the task_struct structure looks promising:
> > 
> > o	Looks like it does in fact get incremented in __schedule() via
> > 	the switch_count pointer.
> > 
> > o	Looks like it is unconditionally compiled in.
> > 
> > o	There are no memory barriers, but a synchronize_sched()
> > 	should take care of that, given that this counter is
> > 	incremented with interrupts disabled.
> 
> Well, there's obviously the actual context switch, which should imply an
> actual MB such that tasks are self ordered even when execution continues
> on another cpu etc..

True enough, except that it appears that the context switch happens
after the ->nvcsw increment, which means that it doesn't help RCU-tasks
guarantee that if it has seen the increment, then all prior processing
has completed.  There might be enough stuff prior the increment, but I
don't see anything that I feel comfortable relying on.  Am I missing
some ordering?

> > So I should be able to snapshot the task_struct structure's ->nvcsw
> > field and avoid the added code in the fastpaths.
> > 
> > Seem plausible, or am I confused about the role of ->nvcsw?
> 
> Nope, that's the 'I scheduled to go to sleep' counter.

I am assuming that the "Nope" goes with "am I confused" rather than
"Seem plausible" -- if not, please let me know.  ;-)

> There is of course the 'polling' issue I raised in a further email...

Yep, and other flavors of RCU go to lengths to avoid scanning the
task_struct lists.  Steven said that updates will be rare and that it
is OK for them to have high latency and overhead.  Thus far, I am taking
him at his word.  ;-)

I considered interrupting the task_struct polling loop periodically,
and would add that if needed.  That said, this requires nailing down the
task_struct at which the vacation is taken.  Here "nailing down" does not
simply mean "prevent from being freed", but rather "prevent from being
removed from the lists traversed by do_each_thread/while_each_thread."

Of course, if there is some easy way of doing this, please let me know!

> > > And I'm still not entirely sure why, your 0/x babbled something about
> > > trampolines, but I'm not sure I understand how those lead to this.
> > 
> > Steven Rostedt sent an email recently giving more detail.  And of course
> > now I am having trouble finding it.  Maybe he will take pity on us and
> > send along a pointer to it.  ;-)
> 
> Yah, would make good Changelog material that ;-)

;-) ;-) ;-)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/