linux-kernel - Re: [accounting regression since rc1] scheduler updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070821122125.GA7910@elte.hu>
Date:	Tue, 21 Aug 2007 14:21:25 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Christian Borntraeger <borntraeger@...ibm.com>
Cc:	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Jan Glauber <jang@...ux.vnet.ibm.com>,
	heiko.carstens@...ibm.com, Paul Mackerras <paulus@...ba.org>
Subject: Re: [accounting regression since rc1]  scheduler updates

* Christian Borntraeger <borntraeger@...ibm.com> wrote:

> > but i dont mind your patch either - it's really the architecture's 
> > choice how visible it wants to make external load to the task stats 
> > of its virtual machines. I think it is more logical to say that 100% 
> > CPU time displayed in 'top' means that the task got all the CPU time 
> > it asked for from the virtual machine. (and if you are curious about 
> > how much time was stolen from the virtual box altogether you look at 
> > the stolen-time stats in isolation.)
> 
> Well, as I said we started with the same approach (virtual cpu) but we 
> learned that these numbers have no meaning at all because the 
> hypervisor does have different scheduling timeslices and having 100% 
> inside the guest can still result in almost nothing if the system is 
> really loaded.

hm, i think i must have used the wrong terminology, so let me describe 
what i mean, so that we can argue this more efficiently ;-)

What i call "real time sched_clock()" is a sched_clock() that returns 
the GTOD (the real time) of the hypervisor. I.e. sched_clock() advances 
by 1 billion units every wall-clock second, in each guest.

A "virtual time sched_clock()" is a sched_clock() that returns only the 
amount of time the virtual CPU was executed by the hypervisor. I.e. on a 
3 times overloaded hypervisor with 3 guests it will advance 333 million 
nanoseconds per 1 wall-clock second, in each guest. (it is 'virtual' 
because the clock slows down as load goes up. In CFS-speak the virtual 
clock is the "fair-clock".)

to me the right scheme for sched_clock() is the virtual variant: to 
return the load-scaled nanoseconds. That way CFS will be able to 
schedule fairly even if time has been "stolen" from a task [by virtue of 
the hypervisor scheduling away the guest context without giving any 
notice about this to the guest kernel] - because sched_clock() measures 
the virtual time that got allocated to that guest by the hypervisor.

[ here i'm assuming precise host and precise guest statistics (which is 
  naturally the case if both are Linux), and in that context the virtual 
  numbers very much make sense, and whether 'top' displays 100% for a 
  sole CPU-bound task should be mostly a matter of tooling. ]

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/