linux-kernel - Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140811032312.GR5821@linux.vnet.ibm.com>
Date:	Sun, 10 Aug 2014 20:23:12 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
	dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
	fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com
Subject: Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

On Sun, Aug 10, 2014 at 10:12:54AM +0200, Peter Zijlstra wrote:
> On Sat, Aug 09, 2014 at 06:26:12PM -0700, Paul E. McKenney wrote:
> > On Sat, Aug 09, 2014 at 08:19:20PM +0200, Peter Zijlstra wrote:
> > > On Sat, Aug 09, 2014 at 09:01:37AM -0700, Paul E. McKenney wrote:
> > > > > That's so wrong its not funny. If you need some abortion to deal with
> > > > > NOHZ_FULL then put it under CONFIG_NOHZ_FULL, don't burden the entire
> > > > > world with it.
> > > > 
> > > > Peter, the polling approach actually -reduces- the common-case
> > > > per-context-switch burden, as in when RCU-tasks isn't doing anything.
> > > > See your own code above.
> > > 
> > > I'm not seeing it, CONFIG_PREEMPT already touches a per task cacheline
> > > for each context switch. And for !PREEMPT this thing should pretty much
> > > reduce to rcu_sched.
> > 
> > Except when you do the wakeup operation, in which case you have something
> > that is either complex, slow and non-scalable, or both.  I am surprised
> > that you want anything like that on that path.
> 
> Its a nr_cpus problem at that point, which is massively better than a
> nr_tasks problem, and I'm sure we've solved this counter thing many
> times, we can do it again.
> 
> But for clarity of purpose the single atomic+waitqueue is far easier.

Either way, it is very likely an NR_CPUS problem after the first scan
of the task list.  Sure, you could have a million preempted tasks on an
eight-CPU system, but if that is the case, an occasional poll loop is
the very least of your worries.

And if we were trying to produce a textbook example, I might agree with
your "clarity of purpose" point.  As it is, sorry, but no.

> > > Would not the thing I proposed be a valid rcu_preempt implementation?
> > > one where its rcu read side primitives run from (voluntary) schedule()
> > > to (voluntary) schedule() call and therefore entirely cover smaller
> > > sections.
> > 
> > In theory, sure.  In practice, blocking on tasks that are preempted
> > outside of an RCU read-side critical section would not be a good thing
> > for normal RCU, which has frequent update operations.  Among other things.
> 
> Sure, just looking for parallels and verifying understanding here. By
> the very nature of not having read side primitives to limit coverage its
> a pessimistic thing.

Fair enough!

> > > > > As for idle tasks, I'm not sure about those, I think that we should say
> > > > > NO to anything that would require waking idle CPUs, push the pain to
> > > > > ftrace/kprobes, we should _not_ be waking idle cpus.
> > > > 
> > > > So the current patch set wakes an idle task once per RCU-tasks grace
> > > > period, but only when that idle task did not otherwise get awakened.
> > > > This is not a real problem.
> > > 
> > > And on the other hand we're trying to reduce random wakeups, so this
> > > sure is a problem. If we don't start, we don't have to fix later.
> > 
> > I doubt that a wakeup at the end of certain ftrace operations is going
> > to be a real problem.
> 
> But its not ftrace, its rcu_task, and if we put it out there, we'll grow
> more and more customers, and soon we'll always have users and never let
> CPUs sleep.
> 
> That's how these things go, so we should really try and push back on
> these things, and that's the thing that worries me most in this
> discussion, you seem very happy to provide what's asked for without due
> consideration of the negatives.

Good point.

So I could make all the entry points static, and call into ftrace and
friends passing them the addresses of the entry points.  I would also
need to pass them to rcutorture.  Then no one uses this stuff without
express permission.

> > > > So I don't believe that the current wakeup rate is a problem, and it
> > > > can be reduced if it proves to be a problem.
> > > 
> > > How about we simply assume 'idle' code, as defined by the rcu idle hooks
> > > are safe? Why do we want to bend over backwards to cover this?
> > 
> > Steven covered this earlier in this thread.  One addition might be "For
> > the same reason that event tracing provides the _rcuidle suffix."
> 
> I really don't think its worth the cost.

That part has been coming in loud and clear for quite some time now.  ;-)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/