lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241223043407.1611-1-kprateek.nayak@amd.com>
Date: Mon, 23 Dec 2024 04:33:59 +0000
From: K Prateek Nayak <kprateek.nayak@....com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>
CC: "H. Peter Anvin" <hpa@...or.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>, Tim Chen
	<tim.c.chen@...ux.intel.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, "Mario
 Limonciello" <mario.limonciello@....com>, Meng Li <li.meng@....com>, Huang
 Rui <ray.huang@....com>, "Gautham R. Shenoy" <gautham.shenoy@....com>, "K
 Prateek Nayak" <kprateek.nayak@....com>
Subject: [PATCH v2 0/8] x86, sched: Dynamic ITMT core ranking support and some yak shaving

The ITMT infrastructure currently assumes ITMT rankings to be static and
is set correctly prior to enabling ITMT support which allows the CPU
with the highest core ranking to be cached as the "asym_prefer_cpu" in
the sched_group struct. However, with the introduction of Preferred Core
support in amd-pstate, these rankings can change at runtime.

v1: https://lore.kernel.org/lkml/20241211185552.4553-1-kprateek.nayak@amd.com/

Tim confirmed that that ITMT changes will not alter the behavior of
Intel Systems that contains multiple MC groups in a PKG domain and
support ITMT - both current and future ones.

Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and
finds it during load balancing in update_sg_lb_stats() before it is used
to make any scheduling decisions. This is the simplest approach; an
alternate approach would be to move the asym_prefer_cpu to
sched_domain_shared and allow the first load balancing instance post a
priority change to update the cached asym_prefer_cpu. On systems with
static priorities, this would allow benefits of caching while on systems
with dynamic priorities, it'll reduce the overhead of finding
"asym_prefer_cpu" each time update_sg_lb_stats() is called however the
benefits come with added code complexity which is why Patch 8 is marked
as an RFC. Srikanth confirmed it works as expected on a PowerPC VM
however, there are no comments yet on the performance impact which is
expected to be minimal if any since update_sglb_stats() is in load
balancing slow-path.

One notable comment that has not been addressed since v1 is moving of
overutilized status to below idle CPU check. On an idle CPU, since there
are no UCLAMP constraints, the cpu_overutilized() boils down to:

    !fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu))

But the averages can capture blocked averages and capacity_of(cpu) can
depends on arch_scale_cpu_capacity(). I couldn't say for sure with 100%
confident that an idle CPU cannot appear overutilized as a result of
blocked averages. find_energy_efficient_cpu() does not look at
idle_cpu() and only performs search purely based on utilization and
capacity - an idle CPu that may appear overutilized will be still
skipped in this search path. If there are no concerns, that update can
be moved to below idle_cpu() check too in Patch 6 as suggested by
Srikanth.

This series is based on

  git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core 

at commit af98d8a36a96 ("sched/fair: Fix CPU bandwidth limit bypass
during CPU hotplug") and is a spiritual successor to a previous attempt
at fixing the x86_die_flags() on Preferred Core enabled system by Mario
that can be found at
https://lore.kernel.org/lkml/20241203201129.31957-1-mario.limonciello@amd.com/
---
v1..v2:

- Collected tags from Tim, Vincent, and Srikanth.

- Modified the layout of struct sg_lb_stats to keep all fields
  concerning SD_ASYM_PACKING together (Srikanth)

- Modified commit message of debugfs move to highlight "N"/"0" can be
  used to disable the feature and "Y"/"1" can be used to enable it back
  (Tim, Peter)
---
K Prateek Nayak (8):
  x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
  x86/itmt: Use guard() for itmt_update_mutex
  x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
  x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
  x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
  sched/fair: Do not compute NUMA Balancing stats unnecessarily during
    lb
  sched/fair: Do not compute overloaded status unnecessarily during lb
  sched/fair: Uncache asym_prefer_cpu and find it during
    update_sd_lb_stats()

 arch/x86/include/asm/topology.h |  4 +-
 arch/x86/kernel/itmt.c          | 81 ++++++++++++++-------------------
 arch/x86/kernel/smpboot.c       | 19 +-------
 kernel/sched/fair.c             | 42 +++++++++++++----
 kernel/sched/sched.h            |  1 -
 kernel/sched/topology.c         | 15 +-----
 6 files changed, 70 insertions(+), 92 deletions(-)


base-commit: af98d8a36a963e758e84266d152b92c7b51d4ecb
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ