[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1187629438.8541.40.camel@localhost>
Date: Mon, 20 Aug 2007 19:03:58 +0200
From: Martin Schwidefsky <schwidefsky@...ibm.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Christian Borntraeger <borntraeger@...ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Jan Glauber <jang@...ux.vnet.ibm.com>,
heiko.carstens@...ibm.com, Paul Mackerras <paulus@...ba.org>
Subject: Re: [accounting regression since rc1] scheduler updates
On Mon, 2007-08-20 at 17:45 +0200, Ingo Molnar wrote:
> * Christian Borntraeger <borntraeger@...ibm.com> wrote:
>
> > 1. Jan could finish his sched_clock implementation for s390 and we
> > would get close to the precise numbers. This would also let CFS make
> > better decisions. [...]
>
> i think this is the best option and it should give us the same /proc
> accuracy on s390 as before, plus improved scheduler precision. (and
> improved tracing accuracy, etc. etc.) Note that for architectures that
> already have sched_clock() at least as precise as the stime/utime stats
> there's no problem - and that seems to include all architectures except
> s390.
For far we have used the TOD clock for sched_clock. This clocks measures
real time with an accuracy of 1usec or better. The [us]time accounting
with CONFIG_VIRT_CPU_ACCOUNTING=y is done using the CPU timer. This
timer measures virtual time with an accuracy of 1usec of better. Without
CONFIG_VIRT_CPU_ACCOUNTING the [us]time accounting is done with HZ
ticks. Which means that sched_clock() is at least as precise as [us]time
on s390 as well, only that we distinguish between real time / virtual
time if the improved accounting is used.
> could you send that precise sched_clock() patch? It should be an order
> of magnitude simpler than the high-precision stime/utime tracking you
> already do, and it's needed for quality scheduling anyway.
Sure if you can explain what it should do. This is still unclear to me,
for a non-idle CPU the virtual cpu time should be used but for an idle
CPU the real time should be used ? That seems rather ill-defined to me.
On s390 we have three times to consider, real time, virtual cpu time and
steal time. For a given period we have real = virtual + steal. And if a
cpu is idle we have real = steal, virtual = 0. My best interpretation of
what you want is that sched_clock should progress with virtual cpu time
if the current process is not idle and with the real time if it is. No ?
> > [...] Downside: its not as precise as before as we do some math on the
> > numbers and it will burn cycles to compute numbers we already have
> > (utime=sum*utime/stime).
>
> i can see no real downside to it: if all of stime, utime and
> sum_exec_clock are precise, then the numbers we present via /proc are
> precise too:
>
> sum_exec * utime / stime;
>
> there should be no loss of precision on s390 because the
> multiplication/division rounding is not accumulating - we keep the
> precise sum_exec, utime and stime values untouched.
But then sched_clock() has to return the virtual cpu time only,
otherwise it will be hard to make sum_exec exact, wouldn't it?
And why should we jump through all these loops to come up with values
that are only as good as the values we already have?
> on x86 we dont really want to slow down every irq and syscall event with
> precise stime/utime stats for 'top' to display. On s390 the
> multiplication and division is indeed superfluous but it keeps the code
> generic for arches where utime/stime is less precise and irq-sampled -
> while the sum is always precise. It also animates architectures that
> have an imprecise sched_clock() implementation to improve its accuracy.
> Accessing the /proc files alone is many orders of magnitude more
> expensive than this simple multiplication and division.
Yes, I can understand why you don't want to have the exact cpu
accounting scheme on x86 since it will slow down every context switch
quite a bit (that includes user <-> kernel, softirq <-> hardirq <->
process context, ..). On s390 the cost is acceptable, for an empty
system call it is about 40 additional cycles for the precise accounting.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists