lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR11MB48240C29F1DEBC79EA933285CD5E9@PH0PR11MB4824.namprd11.prod.outlook.com>
Date:   Sat, 8 Oct 2022 09:40:32 +0000
From:   "Mi, Dapeng1" <dapeng1.mi@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>
CC:     "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "zhenyuw@...ux.intel.com" <zhenyuw@...ux.intel.com>
Subject: RE: [PATCH] KVM: x86: disable halt polling when powersave governor is
 used

> From: Sean Christopherson <seanjc@...gle.com>
> Sent: Saturday, October 8, 2022 1:52 AM
> To: Mi, Dapeng1 <dapeng1.mi@...el.com>
> Cc: pbonzini@...hat.com; tglx@...utronix.de; mingo@...hat.com;
> dave.hansen@...ux.intel.com; kvm@...r.kernel.org; linux-
> kernel@...r.kernel.org; zhenyuw@...ux.intel.com
> Subject: Re: [PATCH] KVM: x86: disable halt polling when powersave governor is
> used
> 
> On Thu, Sep 15, 2022, Dapeng Mi wrote:
> > Halt polling is enabled by default even through the CPU frequency
> > governor is configured to powersave. Generally halt polling would
> > consume extra power and this's not identical with the intent of
> > powersave governor.
> >
> > disabling halt polling in powersave governor can save the precious
> > power in power critical case.
> >
> > FIO random read test on Alder Lake platform shows halt polling
> > occupies ~17% CPU utilization and consume 7% extra CPU power.
> > After disabling halt polling, CPU has more chance to enter deeper
> > C-states (C1E%: 25.3% -> 33.4%, C10%: 4.4% -> 17.4%).
> >
> > On Alder Lake platform, we don't find there are obvious performance
> > downgrade after disabling halt polling on FIO and Netperf cases.
> > Netperf UDP_RR case runs from two VMs locate on two different physical
> > machines.
> >
> > FIO(MB/s)	Base	Disable-halt-polling	Delta%
> > Rand-read	432.6	436.3			0.8%
> >
> > Netperf		Base	Disable-halt-polling	Delta%
> > UDP_RR          509.8	508.5			-0.3%
> >
> > Signed-off-by: Dapeng Mi <dapeng1.mi@...el.com>
> > ---
> >  arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > d7374d768296..c0eb6574cbbb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -13015,7 +13015,22 @@ bool kvm_vector_hashing_enabled(void)
> >
> >  bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)  {
> > -	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
> > +	struct cpufreq_policy *policy = cpufreq_cpu_get(vcpu->cpu);
> 
> Preemption is not disabled at this point, which means that using vcpu->cpu is
> potentially unsafe.  Given that cpufreq is refcounting the returned object, I gotta
> imaging get migrated to a different pCPU would be problematic.

Thanks for pointing this. Per my learning, even vCPU migrates to a different pCPU in the
progress of getting the cpufreq policy, the only consequence is to get an outdated (maybe 
inaccurate) policy value. Halt polling mechanism still can get the updated and correct cpufreq
policy in next time. And even we disable preemption in the process of obtaining cpufreq policy,
the vCPU is still possible to be migrated a different pCPU after enabling preemption and before
calling the halt polling judging logic.

> 
> > +	bool powersave = false;
> 
> I don't see anything in here that's x86 specific.  Unless I'm missing something,
> this belongs in common KVM.
> 

Yes, this is generic. 

> > +
> > +	/*
> > +	 * Halt polling could consume much CPU power, if CPU frequency
> > +	 * governor is set to "powersave", disable halt polling.
> > +	 */
> > +	if (policy) {
> > +		if ((policy->policy == CPUFREQ_POLICY_POWERSAVE) ||
> > +			(policy->governor &&
> 
> Indentation is messed up.

Sure. Would change.

> 
> > +				!strncmp(policy->governor->name,
> "powersave",
> 
> KVM should not be comparing magic strings.  If the cpufreq subsystem can't get
> policy->policy right, then that needs to be fixed.

Yeah, using magic strings looks a little bit strange, but this is what is cpufreq doing.
Currently cpufreq mechanism supports two kinds of drivers, one is the driver which has
the built-in governor, like intel_pstate driver. For this kind of driver, the cpufreq governor
is saved in the policy->policy field. The other is the traditional driver which is independent
with cpufreq governor and the cpufreq governor type is saved in the governor->name field.
For the second kind of cpufreq driver, the policy->policy field is meaningless and we have to
read the governor name. 

> 
> > +					CPUFREQ_NAME_LEN)))
> > +			powersave = true;
> > +		cpufreq_cpu_put(policy);
> > +	}
> > +	return ((vcpu->arch.msr_kvm_poll_control & 1) == 0) || powersave;
> 
> Doing all of the above work if polling is disabled is silly.

Correct. Would change. 

> 
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
> 
> All in all, _if_ we want to do this automatically and not let userspace decide how
> to manage powersave vs. halt-poll, I think this should be more like:

Thanks for your sample. Would change in V2 patch. 

> 
> ---
>  virt/kvm/kvm_main.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> e30f1b4ecfa5..01116859cb31 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -29,6 +29,7 @@
>  #include <linux/file.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/cpu.h>
> +#include <linux/cpufreq.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
>  #include <linux/sched/stat.h>
> @@ -3483,6 +3484,23 @@ static inline void update_halt_poll_stats(struct
> kvm_vcpu *vcpu, ktime_t start,
>  	}
>  }
> 
> +static bool kvm_cpufreq_no_halt_poll(struct kvm_vcpu *vcpu) {
> +	struct cpufreq_policy *policy;
> +	bool powersave = false;
> +
> +	preempt_disable();
> +
> +	policy = cpufreq_cpu_get(vcpu->cpu);
> +	if (policy) {
> +		powersave = (policy->policy == CPUFREQ_POLICY_POWERSAVE);
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +	preempt_enable();
> +	return powersave;
> +}
> +
>  /*
>   * Emulate a vCPU halt condition, e.g. HLT on x86, WFI on arm, etc...  If halt
>   * polling is enabled, busy wait for a short time before blocking to avoid the
> @@ -3491,7 +3509,8 @@ static inline void update_halt_poll_stats(struct
> kvm_vcpu *vcpu, ktime_t start,
>   */
>  void kvm_vcpu_halt(struct kvm_vcpu *vcpu)  {
> -	bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
> +	const bool halt_poll_allowed = !kvm_arch_no_poll(vcpu) &&
> +				       !kvm_cpufreq_no_halt_poll(vcpu);
>  	bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
>  	ktime_t start, cur, poll_end;
>  	bool waited = false;
> 
> base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
> --

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ