[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070821122125.GA7910@elte.hu>
Date: Tue, 21 Aug 2007 14:21:25 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Christian Borntraeger <borntraeger@...ibm.com>
Cc: Martin Schwidefsky <schwidefsky@...ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Jan Glauber <jang@...ux.vnet.ibm.com>,
heiko.carstens@...ibm.com, Paul Mackerras <paulus@...ba.org>
Subject: Re: [accounting regression since rc1] scheduler updates
* Christian Borntraeger <borntraeger@...ibm.com> wrote:
> > but i dont mind your patch either - it's really the architecture's
> > choice how visible it wants to make external load to the task stats
> > of its virtual machines. I think it is more logical to say that 100%
> > CPU time displayed in 'top' means that the task got all the CPU time
> > it asked for from the virtual machine. (and if you are curious about
> > how much time was stolen from the virtual box altogether you look at
> > the stolen-time stats in isolation.)
>
> Well, as I said we started with the same approach (virtual cpu) but we
> learned that these numbers have no meaning at all because the
> hypervisor does have different scheduling timeslices and having 100%
> inside the guest can still result in almost nothing if the system is
> really loaded.
hm, i think i must have used the wrong terminology, so let me describe
what i mean, so that we can argue this more efficiently ;-)
What i call "real time sched_clock()" is a sched_clock() that returns
the GTOD (the real time) of the hypervisor. I.e. sched_clock() advances
by 1 billion units every wall-clock second, in each guest.
A "virtual time sched_clock()" is a sched_clock() that returns only the
amount of time the virtual CPU was executed by the hypervisor. I.e. on a
3 times overloaded hypervisor with 3 guests it will advance 333 million
nanoseconds per 1 wall-clock second, in each guest. (it is 'virtual'
because the clock slows down as load goes up. In CFS-speak the virtual
clock is the "fair-clock".)
to me the right scheme for sched_clock() is the virtual variant: to
return the load-scaled nanoseconds. That way CFS will be able to
schedule fairly even if time has been "stolen" from a task [by virtue of
the hypervisor scheduling away the guest context without giving any
notice about this to the guest kernel] - because sched_clock() measures
the virtual time that got allocated to that guest by the hypervisor.
[ here i'm assuming precise host and precise guest statistics (which is
naturally the case if both are Linux), and in that context the virtual
numbers very much make sense, and whether 'top' displays 100% for a
sole CPU-bound task should be mostly a matter of tooling. ]
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists