[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com>
Date: Fri, 9 Aug 2024 06:02:50 +0000
From: "Tomohiro Misono (Fujitsu)" <misono.tomohiro@...itsu.com>
To: 'Ankur Arora' <ankur.a.arora@...cle.com>, "linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
CC: "catalin.marinas@....com" <catalin.marinas@....com>, "will@...nel.org"
<will@...nel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org"
<x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "wanpengli@...cent.com" <wanpengli@...cent.com>,
"vkuznets@...hat.com" <vkuznets@...hat.com>, "rafael@...nel.org"
<rafael@...nel.org>, "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
"peterz@...radead.org" <peterz@...radead.org>, "arnd@...db.de"
<arnd@...db.de>, "lenb@...nel.org" <lenb@...nel.org>, "mark.rutland@....com"
<mark.rutland@....com>, "harisokn@...zon.com" <harisokn@...zon.com>,
"mtosatti@...hat.com" <mtosatti@...hat.com>, "sudeep.holla@....com"
<sudeep.holla@....com>, "cl@...two.org" <cl@...two.org>,
"joao.m.martins@...cle.com" <joao.m.martins@...cle.com>,
"boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
"konrad.wilk@...cle.com" <konrad.wilk@...cle.com>
Subject: RE: [PATCH v6 00/10] Enable haltpoll on arm64
> Subject: [PATCH v6 00/10] Enable haltpoll on arm64
>
> This patchset enables the cpuidle-haltpoll driver and its namesake
> governor on arm64. This is specifically interesting for KVM guests by
> reducing IPC latencies.
>
> Comparing idle switching latencies on an arm64 KVM guest with
> perf bench sched pipe:
>
> usecs/op %stdev
>
> no haltpoll (baseline) 13.48 +- 5.19%
> with haltpoll 6.84 +- 22.07%
I got similar results with VM on Grace machine (applied to 6.10).
[default]
# cat /sys/devices/system/cpu/cpuidle/current_driver
none
# perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 23.832 [sec]
23.832644 usecs/op
41959 ops/sec
[With "cpuidle-haltpoll.force=1" commandline]
# cat /sys/devices/system/cpu/cpuidle/current_driver
haltpoll
# perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 6.340 [sec]
6.340116 usecs/op
157725 ops/sec
Tested-by: Misono Tomohiro <misono.tomohiro@...itsu.com>
Regards,
Tomohiro
>
>
> No change in performance for a similar test on x86:
>
> usecs/op %stdev
>
> haltpoll w/ cpu_relax() (baseline) 4.75 +- 1.76%
> haltpoll w/ smp_cond_load_relaxed() 4.78 +- 2.31%
>
> Both sets of tests were on otherwise idle systems with guest VCPUs
> pinned to specific PCPUs. One reason for the higher stdev on arm64
> is that trapping of the WFE instruction by the host KVM is contingent
> on the number of tasks on the runqueue.
>
>
> The patch series is organized in three parts:
>
> - patch 1, reorganizes the poll_idle() loop, switching to
> smp_cond_load_relaxed() in the polling loop.
> Relatedly patches 2, 3 mangle the config option ARCH_HAS_CPU_RELAX,
> renaming it to ARCH_HAS_OPTIMIZED_POLL.
>
> - patches 4-6 reorganize the haltpoll selection and init logic
> to allow architecture code to select it.
>
> - and finally, patches 7-10 add the bits for arm64 support.
>
>
> What is still missing: this series largely completes the haltpoll side
> of functionality for arm64. There are, however, a few related areas
> that still need to be threshed out:
>
> - WFET support: WFE on arm64 does not guarantee that poll_idle()
> would terminate in halt_poll_ns. Using WFET would address this.
> - KVM_NO_POLL support on arm64
> - KVM TWED support on arm64: allow the host to limit time spent in
> WFE.
>
>
> Changelog:
>
> v6:
>
> - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL
> changes together (comment from Christoph Lameter)
> - threshes out the commit messages a bit more (comments from Christoph
> Lameter, Sudeep Holla)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - moved back to arch_haltpoll_want() (comment from Joao Martins)
> Also, arch_haltpoll_want() now takes the force parameter and is
> now responsible for the complete selection (or not) of haltpoll.
> - fixes the build breakage on i386
> - fixes the cpuidle-haltpoll module breakage on arm64 (comment from
> Tomohiro Misono, Haris Okanovic)
>
>
> v5:
> - rework the poll_idle() loop around smp_cond_load_relaxed() (review
> comment from Tomohiro Misono.)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on
> arm64 now depends on the event-stream being enabled.
> - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic)
> - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL.
>
> v4 changes from v3:
> - change 7/8 per Rafael input: drop the parens and use ret for the final check
> - add 8/8 which renames the guard for building poll_state
>
> v3 changes from v2:
> - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig
> - add Ack-by from Rafael Wysocki on 2/7
>
> v2 changes from v1:
> - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ
> (this improves by 50% at least the CPU cycles consumed in the tests above:
> 10,716,881,137 now vs 14,503,014,257 before)
> - removed the ifdef from patch 1 per RafaelW
>
> Please review.
>
> Ankur Arora (5):
> cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
> cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
> arm64: idle: export arch_cpu_idle
> arm64: support cpuidle-haltpoll
> cpuidle/poll_state: limit POLL_IDLE_RELAX_COUNT on arm64
>
> Joao Martins (4):
> Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
> cpuidle-haltpoll: define arch_haltpoll_want()
> governors/haltpoll: drop kvm_para_available() check
> arm64: define TIF_POLLING_NRFLAG
>
> Mihai Carabas (1):
> cpuidle/poll_state: poll via smp_cond_load_relaxed()
>
> arch/Kconfig | 3 +++
> arch/arm64/Kconfig | 10 ++++++++++
> arch/arm64/include/asm/cpuidle_haltpoll.h | 9 +++++++++
> arch/arm64/include/asm/thread_info.h | 2 ++
> arch/arm64/kernel/cpuidle.c | 23 +++++++++++++++++++++++
> arch/arm64/kernel/idle.c | 1 +
> arch/x86/Kconfig | 5 ++---
> arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
> arch/x86/kernel/kvm.c | 13 +++++++++++++
> drivers/acpi/processor_idle.c | 4 ++--
> drivers/cpuidle/Kconfig | 5 ++---
> drivers/cpuidle/Makefile | 2 +-
> drivers/cpuidle/cpuidle-haltpoll.c | 12 +-----------
> drivers/cpuidle/governors/haltpoll.c | 6 +-----
> drivers/cpuidle/poll_state.c | 21 ++++++++++++++++-----
> drivers/idle/Kconfig | 1 +
> include/linux/cpuidle.h | 2 +-
> include/linux/cpuidle_haltpoll.h | 5 +++++
> 18 files changed, 94 insertions(+), 31 deletions(-)
> create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
>
> --
> 2.43.5
Powered by blists - more mailing lists