lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101116163325.755a709f@mschwide.boeblingen.de.ibm.com>
Date:	Tue, 16 Nov 2010 16:33:25 +0100
From:	Martin Schwidefsky <schwidefsky@...ibm.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Michael Holzheu <holzheu@...ux.vnet.ibm.com>,
	Shailabh Nagar <nagar1234@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Ingo Molnar <mingo@...e.hu>, Oleg Nesterov <oleg@...hat.com>,
	John stultz <johnstul@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Roland McGrath <roland@...hat.com>,
	linux-kernel@...r.kernel.org, linux-s390@...r.kernel.org,
	"jeremy.fitzhardinge" <jeremy.fitzhardinge@...rix.com>,
	Avi Kivity <avi@...hat.com>
Subject: Re: [RFC][PATCH v2 4/7] taskstats: Add per task steal time
 accounting

On Tue, 16 Nov 2010 13:16:08 +0100
Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:

> On Tue, 2010-11-16 at 09:51 +0100, Martin Schwidefsky wrote:
> > On Mon, 15 Nov 2010 19:08:44 +0100
> > Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> > 
> > > On Mon, 2010-11-15 at 18:59 +0100, Martin Schwidefsky wrote:
> > > > Steal time per task is at least good for performance problem analysis.
> > > > Sometimes knowing what is not the cause of a performance problem can help you
> > > > tremendously. If a task is slow and has no steal time, well then the hypervisor
> > > > is likely not the culprit. On the other hand if you do see lots of steal time
> > > > for a task while the rest of the system doesn't cause any steal time can tell
> > > > you something as well. That task might hit a specific function which causes
> > > > hypervisor overhead. The usefulness depends on the situation, it is another
> > > > data point which may or may not help you. 
> > > 
> > > If performance analysis is the only reason, why not add a tracepoint on
> > > vcpu enter that reports the duration the vcpu was out for and use perf
> > > to gather said data? It can tell you what process was running and what
> > > instruction it was at when the vcpu went away.
> > > 
> > > No need to add 40 bytes per task for that.
> > 
> > Which vcpu enter? We usually have z/VM as our hypervisor and want to be able
> > to do performance analysis with the data we gather inside the guest. There
> > is no vcpu enter we could put a tracepoint on. We would have to put tracepoints
> > on every possible interaction point with z/VM to get this data. To me it seems
> > a lot simpler to add the per-task steal time.
> 
> Oh, you guys don't have a hypercall wrapper to exploit? Because from
> what I heard from the kvm/xen/lguest people I gathered they could in
> fact do something like I proposed.

We could do something along the lines of a hypercall wrapper (we would call
it a diagnose wrapper, same thing). The diagnoses we have on s390 do vastly
different things, so it is not easy to have a common diagnose wrapper.
Would be easier to add a tracepoint for each diagnose inline assembly.

> In fact, kvm seems to already have these tracepoints: kvm_exit/kvm_entry
> and it has a separate excplicit hypercall tracepoint as well:
> kvm_hypercall.

But the kvm tracepoints are used when Linux is the hypervisor, no? For our
situation that would be a tracepoint in z/VM - or the equivalent. This is
out of scope of this patch.

> Except that the per-task steal time gives you lot less detail, being
> able to profile on vcpu exit/enter gives you a much more powerfull
> performance tool. Aside from being able to measure the steal-time it
> allows you to instantly find hypercalls (both explicit as well as
> implicit), so you can also measure the hypercall induced steal-time as
> well.

Yes and no. The tracepoint idea looks interesting in itself. But that does
not completely replace the per-task steal time. The hypervisor can take
away the cpu anytime, it is still interesting to know which task was hit
hardest by that. You could view the cpu time lost by a hypercall as
"synchronous" steal time for the task, the remaining delta to the total
per-task steal time as "asynchronous" steal time.

> > And if it is really the additional 40 bytes on x86 that bother you so much,
> > we could put them behind #ifdef CONFIG_VIRT_CPU_ACCOUNTING. There already
> > is one in the task_struct for prev_utime and prev_stime. 
> 
> Making it configurable would definitely help the embedded people, not
> sure about VIRT_CPU_ACCOUNTING though, I bet the x86 virt weird^Wpeople
> would like it too -- if only to strive for feature parity if nothing
> else :/
> 
> Its just that I'm not at all convinced its the best approach to solve
> the problem posed, and once its committed we're stuck with it due to
> ABI.
> 
> We should be very careful not to die a death of thousand cuts with all
> this accounting madness, there's way too many weird-ass process
> accounting junk that adds ABI constraints as it is.
> 
> I think its definitely worth investing extra time to implement these
> tracepoints if at all possible on your architecture before committing
> yourself to something like this.

I don't think it is either per-task steal time or tracepoints. Ideally 
we'd have both. But I understand that you want to be careful about
committing an ABI. From my viewpoint per-task steal is a logical
extension, the data it is based on is already there.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ