[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190626183016.GA16439@char.us.oracle.com>
Date: Wed, 26 Jun 2019 14:30:16 -0400
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Ankur Arora <ankur.a.arora@...cle.com>,
Joao Martins <joao.m.martins@...cle.com>,
Wanpeng Li <kernellwp@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krcmar <rkrcmar@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
KarimAllah <karahmed@...zon.de>,
LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>
Subject: Re: cputime takes cstate into consideration
On Wed, Jun 26, 2019 at 06:16:08PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2019 at 10:54:13AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 26, 2019 at 12:33:30PM +0200, Thomas Gleixner wrote:
> > > On Wed, 26 Jun 2019, Wanpeng Li wrote:
> > > > After exposing mwait/monitor into kvm guest, the guest can make
> > > > physical cpu enter deeper cstate through mwait instruction, however,
> > > > the top command on host still observe 100% cpu utilization since qemu
> > > > process is running even though guest who has the power management
> > > > capability executes mwait. Actually we can observe the physical cpu
> > > > has already enter deeper cstate by powertop on host. Could we take
> > > > cstate into consideration when accounting cputime etc?
> > >
> > > If MWAIT can be used inside the guest then the host cannot distinguish
> > > between execution and stuck in mwait.
> > >
> > > It'd need to poll the power monitoring MSRs on every occasion where the
> > > accounting happens.
> > >
> > > This completely falls apart when you have zero exit guest. (think
> > > NOHZ_FULL). Then you'd have to bring the guest out with an IPI to access
> > > the per CPU MSRs.
> > >
> > > I assume a lot of people will be happy about all that :)
> >
> > There were some ideas that Ankur (CC-ed) mentioned to me of using the perf
> > counters (in the host) to sample the guest and construct a better
> > accounting idea of what the guest does. That way the dashboard
> > from the host would not show 100% CPU utilization.
>
> But then you generate extra noise and vmexits on those cpus, just to get
> this accounting sorted, which sounds like a bad trade.
Considering that the CPUs aren't doing anything and if you do say the
IPIs "only" 100/second - that would be so small but give you a big benefit
in properly accounting the guests.
But perhaps there are other ways too to "snoop" if a guest is sitting on
an MWAIT?
Powered by blists - more mailing lists