[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120328133835.GF17189@somewhere.redhat.com>
Date: Wed, 28 Mar 2012 15:38:37 +0200
From: Frederic Weisbecker <fweisbec@...il.com>
To: Gilad Ben-Yossef <gilad@...yossef.com>
Cc: Christoph Lameter <cl@...ux.com>,
LKML <linux-kernel@...r.kernel.org>,
linaro-sched-sig@...ts.linaro.org,
Alessio Igor Bogani <abogani@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Avi Kivity <avi@...hat.com>,
Chris Metcalf <cmetcalf@...era.com>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Geoff Levand <geoff@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Max Krasnyansky <maxk@...lcomm.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Stephen Hemminger <shemminger@...tta.com>,
Steven Rostedt <rostedt@...dmis.org>,
Sven-Thorsten Dietrich <thebigcorporation@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Zen Lin <zen@...nhuawei.org>
Subject: Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs
it
On Wed, Mar 28, 2012 at 02:57:44PM +0200, Gilad Ben-Yossef wrote:
> On Wed, Mar 28, 2012 at 2:39 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> > On Tue, Mar 27, 2012 at 05:21:34PM +0200, Gilad Ben-Yossef wrote:
> >> On Thu, Mar 22, 2012 at 6:18 PM, Christoph Lameter <cl@...ux.com> wrote:
> >> > On Thu, 22 Mar 2012, Gilad Ben-Yossef wrote:
> >> >
> >> >> > Is there any way for userspace to know that the tick is not off yet due to
> >> >> > this? It would make sense for us to have busy loop in user space that
> >> >> > waits until the OS has completed all processing if that avoids future
> >> >> > latencies for the application.
> >> >> >
> >> >>
> >> >> I previously suggested having the user register to receive a signal
> >> >> when the tick
> >> >> is turned off. Since the tick is always turned off the user task is
> >> >> the current task
> >> >> by design, *I think* you can simply mark the signal pending when you
> >> >> turn the tick off.
> >> >
> >> > Ok that sounds good. You would define a new signal for this?
> >> >
> >>
> >> My gut instinct is to let the process register with a specific signal
> >> (properly the RT range)
> >> it wants to receive when the tick goes off and/or on.
> >
> > Note the signal itself could trigger an event that could restart the tick.
> > Calling call_rcu() is sufficient for that. We can probably optimize that
> > one day by assigning another CPU to handle the callbacks of a tickless
> > CPU but for now...
> >
>
>
>
> >>
> >> > So we would startup the application. App will do all prep work (memory
> >> > allocation, device setup etc etc) and then wait for the signal to be
> >> > received. After that it would enter the low latency processing phase.
> >> >
> >> > Could we also get a signal if something disrupts the peace and switches
> >> > the timer interrupt on again?
> >> >
> >>
> >> I think you'll have to since once you have the tick turned off there
> >> is no guarantee that
> >> it wont get turned on by a timer scheduling an task or an IPI.
> >
> > The problem with this scheme is that if the task is running with the
> > guarantee that nothing is going to disturb it (it assumes so when it
> > is notified that the timer is stopped), can it seriously recover from
> > the fact the timer has been restarted once it gets notified about it?
>
> Recovery in this context involves a programmer/system architect looking
> into what made the tick start and making sure that wont happen the next
> time around.
>
> I know it's not quite what you had in mind, but it works :-)
So this is about fixing bugs. Tracing may fit better for that.
>
> >
> > I have a hard time to imagine that. It's like an RT task running a
> > critical part that suddenly receives a notification from the kernel that
> > says "what's up dude? hey by the way you're not real time anymore" :)
> > How are we recovering from that?
>
> The point is that it is the difference between a QA report that says:
>
> "Performance dropped below acceptable level for 10 ms some when
> during the test run"
>
> and
>
> "We got an indication that the kernel resumed the tick on us, so the test
> was stopped and here is the stack trace for all the tasks running,
> plus the logs".
That's about post run analysis, that's sounds to be a job for tracing.
>
>
> > May be instead of focusing on these notifications, we should try hard to
> > shut down the tick before we reach userspace: delegate RCU work
> > to another CPU, avoid needless IPIs, avoid needless timer list timers, etc...
> > Fix those things one by one such that we can configure things to the point we
> > get closer to a guarantee of CPU isolation.
> >
> > Does that sound reasonable?
>
> It does to me :-)
>
> Gilad
>
>
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
> gilad@...yossef.com
> Israel Cell: +972-52-8260388
> US Cell: +1-973-8260388
> http://benyossef.com
>
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
> -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists