[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45F74515.7010808@vmware.com>
Date: Tue, 13 Mar 2007 17:43:01 -0700
From: Dan Hecht <dhecht@...are.com>
To: Jeremy Fitzhardinge <jeremy@...p.org>
Cc: dwalker@...sta.com, cpufreq@...ts.linux.org.uk,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Con Kolivas <kernel@...ivas.org>,
Chris Wright <chrisw@...s-sol.org>,
Virtualization Mailing List <virtualization@...ts.osdl.org>,
john stultz <johnstul@...ibm.com>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Stolen and degraded time and schedulers
On 03/13/2007 02:59 PM, Jeremy Fitzhardinge wrote:
> Daniel Walker wrote:
>> The frequency tracking you mention is done to some extent inside the
>> timekeeping adjustment functions, but I'm not sure it's totally accurate
>> for non-timekeeping, and it also tracks things like interrupt latency.
>> Tracking frequency changes where it's important to get it right
>> shouldn't be done I think ..
>>
>> If you want accurate time accounting, don't use the TSC .
>>
>
> I'm not sure I follow you here. Clocksources have the means to adjust
> the rate of time progression, mostly to warp the time for things like
> ntp. The stability or otherwise of the tsc is irrelevant.
>
> If you had a clocksource which was explicitly using the rate at which a
> CPU does work as a timebase, then using the same warping mechanism would
> allow you to model CPU speed changes.
>
>> The sched_clock interface is basically a stripped down clocksource..
>> I've implemented sched_clock as a clocksource in the past ..
>>
>
> Yes, that works. But a clocksource is strictly about measuring the
> progression of real time, and so doesn't generally measure how much work
> a CPU has done.
>
>>> We currently have a sched_clock interface in paravirt_ops to deal with
>>> the hypervisor aspect. It only occurred to me this morning that cpufreq
>>> presents exactly the same problem to the rest of the kernel, and so
>>> there's room for a more general solution.
>>>
>> Are there other architecture which have this per-cpu clock frequency
>> changing issue? I worked with several other architectures beyond just
>> x86 and haven't seen this issue ..
>
> Well, lots of cpus have dynamic frequencies. Any scheduler which
> maintains history will suffer the same problem, even on UP. If
> processes A and B are supposed to have the same priority and they both
> execute for 1ms of real time, did they make the same amount of
> progress? Not if the cpu changed speed in between.
>
> And any system which commonly runs virtualized (s390, power, etc) will
> need to deal with the notion of stolen time.
>
With your previous definition of work time, would it be that:
monotonic_time == work_time + stolen_time ??
i.e. would you be defining stolen_time to include the time lost to
processes due to the cpu running at a lower frequency? How does this
play into the other potential users, besides sched_clock(), of stolen
time? We should make sure that the abstraction introduced here makes
sense in those places too.
For example, the stuff that happens in update_process_times(). I think
we'd want to account the stolen time to cpustat->steal. Also we'd
probably want account for stolen time with regards to
task_running_tick(). (Though, in the latter case, maybe we first have
to move the scheduler away from assuming HZ rate decrementing of
p->time_slice to get this right. i.e. remove the tick based assumption
from the scheduler, and then maybe stolen time falls in more naturally
when accounting time slices).
I guess taking your cpufreq as an example of work_time progressing
slower than monotonic_time (and assuming that the remaining time is what
you would call stolen), then e.g. top would report 50% of your cpu
stolen when you cpu is running at 1/2 max rate. And p->time_slice would
decrement at 1/2 the rate it normally did when running at 1/2 speed. Is
this the right thing to do? If so, then I agree it makes sense to model
hypervisor stolen time in terms of your "work time". But, if not, then
maybe the amount of work you can get done during a period of time that
is not stolen and the stolen time itself are really two different
notions, and shouldn't be confused. I can see arguments both ways.
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists