linux-kernel - Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140810081254.GS9918@twins.programming.kicks-ass.net>
Date:	Sun, 10 Aug 2014 10:12:54 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
	dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
	fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com
Subject: Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

On Sat, Aug 09, 2014 at 06:26:12PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 09, 2014 at 08:19:20PM +0200, Peter Zijlstra wrote:
> > On Sat, Aug 09, 2014 at 09:01:37AM -0700, Paul E. McKenney wrote:
> > > > That's so wrong its not funny. If you need some abortion to deal with
> > > > NOHZ_FULL then put it under CONFIG_NOHZ_FULL, don't burden the entire
> > > > world with it.
> > > 
> > > Peter, the polling approach actually -reduces- the common-case
> > > per-context-switch burden, as in when RCU-tasks isn't doing anything.
> > > See your own code above.
> > 
> > I'm not seeing it, CONFIG_PREEMPT already touches a per task cacheline
> > for each context switch. And for !PREEMPT this thing should pretty much
> > reduce to rcu_sched.
> 
> Except when you do the wakeup operation, in which case you have something
> that is either complex, slow and non-scalable, or both.  I am surprised
> that you want anything like that on that path.

Its a nr_cpus problem at that point, which is massively better than a
nr_tasks problem, and I'm sure we've solved this counter thing many
times, we can do it again.

But for clarity of purpose the single atomic+waitqueue is far easier.

> > Would not the thing I proposed be a valid rcu_preempt implementation?
> > one where its rcu read side primitives run from (voluntary) schedule()
> > to (voluntary) schedule() call and therefore entirely cover smaller
> > sections.
> 
> In theory, sure.  In practice, blocking on tasks that are preempted
> outside of an RCU read-side critical section would not be a good thing
> for normal RCU, which has frequent update operations.  Among other things.

Sure, just looking for parallels and verifying understanding here. By
the very nature of not having read side primitives to limit coverage its
a pessimistic thing.

> > > > As for idle tasks, I'm not sure about those, I think that we should say
> > > > NO to anything that would require waking idle CPUs, push the pain to
> > > > ftrace/kprobes, we should _not_ be waking idle cpus.
> > > 
> > > So the current patch set wakes an idle task once per RCU-tasks grace
> > > period, but only when that idle task did not otherwise get awakened.
> > > This is not a real problem.
> > 
> > And on the other hand we're trying to reduce random wakeups, so this
> > sure is a problem. If we don't start, we don't have to fix later.
> 
> I doubt that a wakeup at the end of certain ftrace operations is going
> to be a real problem.

But its not ftrace, its rcu_task, and if we put it out there, we'll grow
more and more customers, and soon we'll always have users and never let
CPUs sleep.

That's how these things go, so we should really try and push back on
these things, and that's the thing that worries me most in this
discussion, you seem very happy to provide what's asked for without due
consideration of the negatives.

> > > So I don't believe that the current wakeup rate is a problem, and it
> > > can be reduced if it proves to be a problem.
> > 
> > How about we simply assume 'idle' code, as defined by the rcu idle hooks
> > are safe? Why do we want to bend over backwards to cover this?
> 
> Steven covered this earlier in this thread.  One addition might be "For
> the same reason that event tracing provides the _rcuidle suffix."

I really don't think its worth the cost.

Content of type "application/pgp-signature" skipped