linux-kernel - Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120328133835.GF17189@somewhere.redhat.com>
Date:	Wed, 28 Mar 2012 15:38:37 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Gilad Ben-Yossef <gilad@...yossef.com>
Cc:	Christoph Lameter <cl@...ux.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linaro-sched-sig@...ts.linaro.org,
	Alessio Igor Bogani <abogani@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Avi Kivity <avi@...hat.com>,
	Chris Metcalf <cmetcalf@...era.com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Geoff Levand <geoff@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Max Krasnyansky <maxk@...lcomm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Stephen Hemminger <shemminger@...tta.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Sven-Thorsten Dietrich <thebigcorporation@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Zen Lin <zen@...nhuawei.org>
Subject: Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs
 it

On Wed, Mar 28, 2012 at 02:57:44PM +0200, Gilad Ben-Yossef wrote:
> On Wed, Mar 28, 2012 at 2:39 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> > On Tue, Mar 27, 2012 at 05:21:34PM +0200, Gilad Ben-Yossef wrote:
> >> On Thu, Mar 22, 2012 at 6:18 PM, Christoph Lameter <cl@...ux.com> wrote:
> >> > On Thu, 22 Mar 2012, Gilad Ben-Yossef wrote:
> >> >
> >> >> > Is there any way for userspace to know that the tick is not off yet due to
> >> >> > this? It would make sense for us to have busy loop in user space that
> >> >> > waits until the OS has completed all processing if that avoids future
> >> >> > latencies for the application.
> >> >> >
> >> >>
> >> >> I previously suggested having the user register to receive a signal
> >> >> when the tick
> >> >> is turned off. Since the tick is always turned off the user task is
> >> >> the current task
> >> >> by design, *I think* you can simply mark the signal pending when you
> >> >> turn the tick off.
> >> >
> >> > Ok that sounds good. You would define a new signal for this?
> >> >
> >>
> >> My gut instinct is to let the process register with a specific signal
> >> (properly the RT range)
> >> it wants to receive when the tick goes off and/or on.
> >
> > Note the signal itself could trigger an event that could restart the tick.
> > Calling call_rcu() is sufficient for that. We can probably optimize that
> > one day by assigning another CPU to handle the callbacks of a tickless
> > CPU but for now...
> >
> 
> 
> 
> >>
> >> > So we would startup the application. App will do all prep work (memory
> >> > allocation, device setup etc etc) and then wait for the signal to be
> >> > received. After that it would enter the low latency processing phase.
> >> >
> >> > Could we also get a signal if something disrupts the peace and switches
> >> > the timer interrupt on again?
> >> >
> >>
> >> I think you'll have to since once you have the tick turned off there
> >> is no guarantee that
> >> it wont get turned on by a timer scheduling an task or an IPI.
> >
> > The problem with this scheme is that if the task is running with the
> > guarantee that nothing is going to disturb it (it assumes so when it
> > is notified that the timer is stopped), can it seriously recover from
> > the fact the timer has been restarted once it gets notified about it?
> 
> Recovery in this context involves a programmer/system architect looking
> into what made the tick start and making sure that wont happen the next
> time around.
> 
> I know it's not quite what you had in mind, but it works :-)

So this is about fixing bugs. Tracing may fit better for that.

> 
> >
> > I have a hard time to imagine that. It's like an RT task running a
> > critical part that suddenly receives a notification from the kernel that
> > says "what's up dude? hey by the way you're not real time anymore" :)
> > How are we recovering from that?
> 
> The point is that it is the difference between a QA report that says:
> 
> "Performance dropped below acceptable level for 10 ms some when
> during the test run"
> 
> and
> 
> "We got an indication that the kernel resumed the tick on us, so the test
> was stopped and here is the stack trace for all the tasks running,
> plus the logs".

That's about post run analysis, that's sounds to be a job for tracing.

> 
> 
> > May be instead of focusing on these notifications, we should try hard to
> > shut down the tick before we reach userspace: delegate RCU work
> > to another CPU, avoid needless IPIs, avoid needless timer list timers, etc...
> > Fix those things one by one such that we can configure things to the point we
> > get closer to a guarantee of CPU isolation.
> >
> > Does that sound reasonable?
> 
> It does to me :-)
> 
> Gilad
> 
> 
> -- 
> Gilad Ben-Yossef
> Chief Coffee Drinker
> gilad@...yossef.com
> Israel Cell: +972-52-8260388
> US Cell: +1-973-8260388
> http://benyossef.com
> 
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
>  -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/