lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y0BnKIW+7sqJbTyY@google.com>
Date:   Fri, 7 Oct 2022 17:51:36 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Dapeng Mi <dapeng1.mi@...el.com>
Cc:     pbonzini@...hat.com, tglx@...utronix.de, mingo@...hat.com,
        dave.hansen@...ux.intel.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, zhenyuw@...ux.intel.com
Subject: Re: [PATCH] KVM: x86: disable halt polling when powersave governor
 is used

On Thu, Sep 15, 2022, Dapeng Mi wrote:
> Halt polling is enabled by default even through the CPU frequency
> governor is configured to powersave. Generally halt polling would
> consume extra power and this's not identical with the intent of
> powersave governor.
> 
> disabling halt polling in powersave governor can save the precious
> power in power critical case.
> 
> FIO random read test on Alder Lake platform shows halt polling
> occupies ~17% CPU utilization and consume 7% extra CPU power.
> After disabling halt polling, CPU has more chance to enter deeper
> C-states (C1E%: 25.3% -> 33.4%, C10%: 4.4% -> 17.4%).
> 
> On Alder Lake platform, we don't find there are obvious performance
> downgrade after disabling halt polling on FIO and Netperf cases.
> Netperf UDP_RR case runs from two VMs locate on two different physical
> machines.
> 
> FIO(MB/s)	Base	Disable-halt-polling	Delta%
> Rand-read	432.6	436.3			0.8%
> 
> Netperf		Base	Disable-halt-polling	Delta%
> UDP_RR          509.8	508.5			-0.3%
> 
> Signed-off-by: Dapeng Mi <dapeng1.mi@...el.com>
> ---
>  arch/x86/kvm/x86.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..c0eb6574cbbb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13015,7 +13015,22 @@ bool kvm_vector_hashing_enabled(void)
>  
>  bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
>  {
> -	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
> +	struct cpufreq_policy *policy = cpufreq_cpu_get(vcpu->cpu);

Preemption is not disabled at this point, which means that using vcpu->cpu is
potentially unsafe.  Given that cpufreq is refcounting the returned object, I gotta
imaging get migrated to a different pCPU would be problematic.

> +	bool powersave = false;

I don't see anything in here that's x86 specific.  Unless I'm missing something,
this belongs in common KVM.

> +
> +	/*
> +	 * Halt polling could consume much CPU power, if CPU frequency
> +	 * governor is set to "powersave", disable halt polling.
> +	 */
> +	if (policy) {
> +		if ((policy->policy == CPUFREQ_POLICY_POWERSAVE) ||
> +			(policy->governor &&

Indentation is messed up.

> +				!strncmp(policy->governor->name, "powersave",

KVM should not be comparing magic strings.  If the cpufreq subsystem can't get
policy->policy right, then that needs to be fixed.

> +					CPUFREQ_NAME_LEN)))
> +			powersave = true;
> +		cpufreq_cpu_put(policy);
> +	}
> +	return ((vcpu->arch.msr_kvm_poll_control & 1) == 0) || powersave;

Doing all of the above work if polling is disabled is silly.

>  }
>  EXPORT_SYMBOL_GPL(kvm_arch_no_poll);

All in all, _if_ we want to do this automatically and not let userspace decide how
to manage powersave vs. halt-poll, I think this should be more like:

---
 virt/kvm/kvm_main.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e30f1b4ecfa5..01116859cb31 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -29,6 +29,7 @@
 #include <linux/file.h>
 #include <linux/syscore_ops.h>
 #include <linux/cpu.h>
+#include <linux/cpufreq.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/stat.h>
@@ -3483,6 +3484,23 @@ static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start,
 	}
 }
 
+static bool kvm_cpufreq_no_halt_poll(struct kvm_vcpu *vcpu)
+{
+	struct cpufreq_policy *policy;
+	bool powersave = false;
+
+	preempt_disable();
+
+	policy = cpufreq_cpu_get(vcpu->cpu);
+	if (policy) {
+		powersave = (policy->policy == CPUFREQ_POLICY_POWERSAVE);
+		cpufreq_cpu_put(policy);
+	}
+
+	preempt_enable();
+	return powersave;
+}
+
 /*
  * Emulate a vCPU halt condition, e.g. HLT on x86, WFI on arm, etc...  If halt
  * polling is enabled, busy wait for a short time before blocking to avoid the
@@ -3491,7 +3509,8 @@ static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start,
  */
 void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
-	bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
+	const bool halt_poll_allowed = !kvm_arch_no_poll(vcpu) &&
+				       !kvm_cpufreq_no_halt_poll(vcpu);
 	bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
 	ktime_t start, cur, poll_end;
 	bool waited = false;

base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ