[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140227083735.GA5129@austad.us>
Date: Thu, 27 Feb 2014 09:37:35 +0100
From: Henrik Austad <henrik@...tad.us>
To: Frederic Weisbecker <fweisbec@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Henrik Austad <haustad@...co.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
John Stultz <john.stultz@...aro.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH 0/6 v2] Expose do_timer CPU as RW to userspace
On Wed, Feb 26, 2014 at 02:02:42PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 26, 2014 at 09:16:03AM +0100, Henrik Austad wrote:
> > On Tue, Feb 25, 2014 at 03:19:09PM +0100, Frederic Weisbecker wrote:
> > > On Tue, Feb 25, 2014 at 01:33:55PM +0100, Henrik Austad wrote:
> > > > From: Henrik Austad <haustad@...co.com>
> > > >
> > > > Hi!
> > > >
> > > > This is a rework of the preiovus patch based on the feedback gathered
> > > > from the last round. I've split it up a bit, mostly to make it easier to
> > > > single out the parts that require more attention (#4 comes to mind).
> > > >
> > > > Being able to read (and possible force a specific CPU to handle all
> > > > do_timer() updates) can be very handy when debugging a system and tuning
> > > > for performance. It is not always easy to route interrupts to a specific
> > > > core (or away from one, for that matter).
> > >
> > > It's a bit vague as a reason for the patchset. Do we really need it?
> >
> > One case is to move the timekeeping away from cores I know have
> > interrupt-issues (in an embedded setup, it is not always easy to move
> > interrupts away).
> >
> > Another is to remove jitter from cores doing either real-time work or heavy
> > workerthreads. The timekeeping update is pretty fast, but I do not see any
> > reason for letting timekeeping interfere with my workers if it does not
> > have to.
>
> Ok. I'll get back to that below.
>
> > > Concerning the read-only part, if I want to know which CPU is handling the
> > > timekeeping, I'd rather use tracing than a sysfs file. I can correlate
> > > timekeeping update traces with other events. Especially as the timekeeping duty
> > > can change hands and move to any CPU all the time. We really don't want to
> > > poll on a sysfs file to get that information. It's not adapted and doesn't
> > > carry any timestamp. It may be useful only if the timekeeping CPU is static.
> >
> > I agree that not having a timestamp will make it useless wrt to tracing,
> > but that was never the intention. By having a sysfs/sysctl value you can
> > quickly determine if the timekeeping is bound to a single core or if it is
> > handled everywhere.
> >
> > Tracing will give you the most accurate result, but that's not always what
> > you want as tracing also provides an overhead (both in the kernel as well
> > as in the head of the user) using the sysfs/sysctl interface for grabbing
> > the CPU does not.
> >
> > You can also use it to verify that the forced-cpu you just sat, did in fact
> > have the desired effect.
> >
> > Another approach I was contemplating, was to let current_cpu return the
> > current mask CPUs where the timer is running, once you set it via
> > forced_cpu, it will narrow down to that particular core. Would that be more
> > useful for the RO approach outisde TICK_PERIODIC?
>
> Ok so this is about checking which CPU the timekeeping is bound to.
> But what do you diplay in the normal case (ie: when timekeeping is globally affine?)
>
> -1 could be an option but hmm...
I don't really like -1, that indicates that it is disabled and could
confuse people, letting them think that timekeeping is disabled at all
cores.
> Wouldn't it be saner to use a cpumask of the timer affinity instead? This
> is the traditional way we affine something in /proc or /sys
Yes, that's what I'm starting to think as well, that would make a lot more
sense when the timer is bounced around.
something like a 'current_cpu_mask' which would return a hex-mask
of the cores where the timekeeping update _could_ run.
For periodic, that would be a single core (normally boot), and when forced,
it would return a cpu-mask with only one cpu set. Then the result would be
a lot more informative for NO_HZ_(IDLE|FULL) as well.
Worth a shot? (completely disjoint from the write-discussion below)
> > > Now looking at the write part. What kind of usecase do you have in mind?
> >
> > Forcing the timer to run on single core only, and a core of my choosing at
> > that.
> >
> > - Get timekeeping away from cores with bad interrupts (no, I cannot move
> > them).
> > - Avoid running timekeeping udpates on worker-cores.
>
> Ok but what you're moving away is not the tick but the timekeeping duty, which
> is only a part of the tick. A significant part but still just a part.
That is certainly true, but that part happens to be of global influence, so
if I have a core where a driver disables interrupts a lot (or drops into a
hypervisor, or any other silly thing it really shouldn't be doing), then I
would like to be able to move the timekeeping updates away from that core.
The same goes for cores running rt-tasks (>1), I really do not want -any-
interference at all, and if I can remove the extra jitter from the
timekeeping, I'm pretty happy to do so.
> Does this all make sense outside the NO_HZ_FULL case?
In my view, it makes sense in the periodic case as well since all
timekeeping updates then happens on the boot-cpu (unless it is hotunplugged
that is).
> >
> > > It's also important to consider that, in the case of NO_HZ_IDLE, if you force
> > > the timekeeping duty to a specific CPU, it won't be able to enter in dynticks
> > > idle mode as long as any other CPU is running.
> >
> > Yes, it will in effect be a TICK_PERIODIC core where I can configure which
> > core the timekeeping update will happen.
>
> Ok, I missed that part. So when the timekeeping is affine to a specific CPU,
> this CPU is prevented to enter into dynticks idle mode?
That's what I aimed at, and I *think* I managed that. I added a
forced_timer_can_stop_tick() and let can_stop_full_tick() and
can_stop_idle_tick() call that. I think that is sufficient, at least I did
not see that the timerduty was transferred to another core afterwards.
> > > Because those CPUs can make use of jiffies or gettimeofday() and must
> > > have uptodate values. This involve quite some complication like using the
> > > full system idle detection (CONFIG_NO_HZ_FULL_SYSIDLE) to avoid races
> > > between timekeeper entering dynticks idle mode and other CPUs waking up
> > > from idle. But the worst here is the powesaving issues resulting from the
> > > timekeeper who can't sleep.
> >
> > Personally, when I force the timer to be bound to a specific CPU, I'm
> > pretty happy with the fact that it won't be allowed to turn ticks off. At
> > that stage, powersave is the least of my concerns, throughput and/or jitter
> > is.
> >
> > I know that what I'm doing is in effect turning the kernel into a
> > somewhat more configurable TICK_PERIODIC kernel (in the sense that I can
> > set the timer to run on something other than the boot-cpu).
>
> I see.
>
> >
> > > These issues are being dealt with in NO_HZ_FULL because we want the
> > > timekeeping duty to be affine to the CPUs that are no full dynticks. But
> > > in the case of NO_HZ_IDLE, I fear it's not going to be desirable.
> >
> > Hum? I didn't get that one, what do you mean?
>
> So in NO_HZ_FULL we do something that is very close to what're doing: the timekeeping
> is affine to the boot CPU and it stays periodic whatever happens.
>
> But we start to worry about powersaving. When the whole system is idle, there is
> no point is preventing the CPU 0 to sleep. So we are dealing with that by using a
> full system idle detection that lets CPU 0 go to sleep when there is strictly nothing
> to do. Then when nohz full CPU wakes up from idle, CPU 0 is woken up as well to get back
> to its timekeeping duty.
Hmm, I had the impreesion that when a CPU with timekeeping-duty was sent to
sleep, it would set tick_do_timer_cpu to TICK_DO_TIMER_NONE, and whenever
another core would run do_timer() it would see if tick_do_timer_cpu was set
to TICK_DO_TIMER_NONE and if so, grab it and run with it.
I really don't see how this wakes up CPU0 (but then again, there's probably
several layers of logic here that I'm missing :)
--
Henrik Austad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists