[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200113123558.GF2827@hirez.programming.kicks-ass.net>
Date: Mon, 13 Jan 2020 13:35:58 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Wanpeng Li <kernellwp@...il.com>,
LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Marcelo Tosatti <mtosatti@...hat.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
KarimAllah <karahmed@...zon.de>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ingo Molnar <mingo@...nel.org>,
Ankur Arora <ankur.a.arora@...cle.com>,
christopher.s.hall@...el.com, hubert.chrzaniuk@...el.com,
len.brown@...el.com, thomas.lendacky@....com, rjw@...ysocki.net
Subject: Re: [PATCH RFC] sched/fair: Penalty the cfs task which executes
mwait/hlt
On Mon, Jan 13, 2020 at 12:52:20PM +0100, Paolo Bonzini wrote:
> On 13/01/20 11:43, Peter Zijlstra wrote:
> > So the very first thing we need to get sorted is that MPERF/TSC ratio
> > thing. TurboStat does it, but has 'funny' hacks on like:
> >
> > b2b34dfe4d9a ("tools/power turbostat: KNL workaround for %Busy and Avg_MHz")
> >
> > and I imagine that there's going to be more exceptions there. You're
> > basically going to have to get both Intel and AMD to commit to this.
> >
> > IFF we can get concensus on MPERF/TSC, then yes, that is a reasonable
> > way to detect a VCPU being idle I suppose. I've added a bunch of people
> > who seem to know about this.
> >
> > Anyone, what will it take to get MPERF/TSC 'working' ?
>
> Do we really need MPERF/TSC for this use case, or can we just track
> APERF as well and do MPERF/APERF to compute the "non-idle" time?
So MPERF runs at fixed frequency (when !IDLE and typically the same
frequency as TSC), APERF runs at variable frequency (when !IDLE)
depending on DVFS state.
So APERF/MPERF gives the effective frequency of the core, but since both
stop during IDLE, it will not be a good indication of IDLE.
Otoh, TSC doesn't stop in idle (.oO this depends on
X86_FEATURE_CONSTANT_TSC) and therefore the MPERF/TSC ratio gives how
much !idle time there was between readings.
Powered by blists - more mailing lists