[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150803180108.GD26022@lerouge>
Date: Mon, 3 Aug 2015 20:01:09 +0200
From: Frederic Weisbecker <fweisbec@...il.com>
To: Chris Metcalf <cmetcalf@...hip.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Ingo Molnar <mingo@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH 08/10] posix-cpu-timers: Migrate to use new tick
dependency mask model
On Mon, Aug 03, 2015 at 11:59:07AM -0400, Chris Metcalf wrote:
> On 07/31/2015 10:49 AM, Frederic Weisbecker wrote:
> >Instead of doing a per signal dependency, I'm going to use a per task
> >one. Which means that if a per-process timer is enqueued, every thread
> >of that process will have the tick dependency. But if the timer is
> >enqueued to a single thread, only the thread is concerned.
> >
> >We'll see if offloading becomes really needed. It's not quite free because
> >the housekeepers will have to poll on all nohz CPUs at a Hz frequency.
>
> Seems reasonable for now!
>
> Why would we need the Hz frequency polling, though? I would
> think it should be possible to just arrange it such that the timer
> for posix cpu timers would just always be placed either on the core
> that requested it, or if that core is nohz_full, on a housekeeping
> core. Then it would eventually fire from the housekeeping core,
> and the logic could be such that (for a process-wide timer) it
> would preferentially interrupt threads from that process that
> were running on the housekeeping cores. No polling.
But you need to periodically poll on timer expiration from a housekeeper.
It's not only about firing the timer, it's about elapsing it against the
target cputime.
Since there is no tick on a nohz full CPU to account the time spent by
the task, you must do that elsewhere. And if you don't poll in a sufficient
frequency, the time accounted is less precise (a quick round-trip to kernel space
can be missed if the polling frequency is too low). Or you can combine it
with the VIRT_CPU_ACCOUNTING_GEN that we are using currently which records the
time spent in user and kernel space using hooks. Still you must check periodically
that the timer hasn't expired at a frequency that doesn't go further the
expiration time. Easy in the case of a timer attached to a single task but what
about a timer attached to a process? You must poll at least at expiration/nr_threads,
so you must handle thread creation as well.
Offlining posix timers sounds like a big headache if we don't poll at Hz time.
That said Rick has posted patches that offline cputime accounting. I'm not yet sure
this patchset is a good idea but offlining posix timers can be done on top of that.
Another thing: now I recall why I turned posix timers to a global tick dependency.
In case of a per task/process dependency we still need the context switch hook because
if we enqueue a timer to a sleeping task, the tick must be restarted when the task wakes
up. And that requires a check on context switch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists