lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 5 Aug 2009 08:59:19 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Laurent Vivier <Laurent.Vivier@...l.net>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Avi Kivity <avi@...hat.com>
Cc:	kvm-devel <kvm-devel@...ts.sourceforge.net>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	virtualization <virtualization@...ts.linux-foundation.org>
Subject: Re: [PATCH 2/4] Introduce a new fields "gtime" and "cgtime" in
	task_struct and signal_struct


* Laurent Vivier <Laurent.Vivier@...l.net> wrote:

> [PATCH 2/4] like for cpustat, introduce the "gtime" (guest time of 
> the task) and "cgtime" (guest time of the task children) fields 
> for the tasks. Modify signal_struct and task_struct. Modify 
> /proc/<pid>/stat to display these new fields.

> --- kvm.orig/include/linux/sched.h	2007-08-20 11:11:30.000000000 +0200
> +++ kvm/include/linux/sched.h	2007-08-20 13:00:02.000000000 +0200
> @@ -515,6 +515,10 @@ struct signal_struct {
>  	 * in __exit_signal, except for the group leader.
>  	 */
>  	cputime_t utime, stime, cutime, cstime;
> +#ifdef CONFIG_GUEST_ACCOUNTING
> +	cputime_t gtime;
> +	cputime_t cgtime;
> +#endif

A handful of general (and less general) observations about these 
patches:

 1- The code is very ugly due to being an #ifdef fest. Please
    always try to avoid them.

 2- cputime_t is very coarse on x86: measured in jiffies. This means
    that with a default HZ of 250 we'll have units of 4 msecs. 
    That's almost useless to rely on in new instrumentation: an irq 
    can come in and out without accounting noticing it, etc. If we 
    do some new statistics then it should be a lot better than 
    jiffies granular.

 3- stime of vcpu tasks/threads already approximates 'guest time' 
    adequately. (as Jeremy observed it as well) Yes, it mixes 'true 
    guest mode' and 'host mode' system time, but then again due to 
    the jiffies granularity we have a _far_ bigger skew going on 
    already.

 4- namespace collision: 'gtime' is already used as 'group time' in 
    a few places. One of the two things needs to be renamed.

 5- tracepoints and perfcounters could be used to measure guest time 
    precisely, in a low-overhead mode.

These issues need to be addressed in a meaningful way. #2 probably 
means a revamping of cputime_t handling on x86 - of not just the 
gtime. But #3 is worth keeping in mind as well.

I think #5 is the most capable solution by a wide margin - we need 
just a single tracepoint to emit 'nsecs spent in guest mode' 
information and that's it. It would be a far smaller patch.

The tracepoint might even sample the guest RIP and hence could be 
used as a VM-exit profiler and 'perf record -e kvm:vm_exit + perf 
report' could be used to examine/profile/trace guest exit reasons.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ