linux-kernel - Re: [PATCH v3 2/2] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240904063915.GA3674@ranerica-svr.sc.intel.com>
Date: Tue, 3 Sep 2024 23:39:15 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: x86 Maintainers <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Linux PM <linux-pm@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ricardo Neri <ricardo.neri@...el.com>,
	Tim Chen <tim.c.chen@...el.com>
Subject: Re: [PATCH v3 2/2] cpufreq: intel_pstate: Set asymmetric CPU
 capacity on hybrid systems

On Wed, Aug 28, 2024 at 01:48:10PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> Make intel_pstate use the HWP_HIGHEST_PERF values from
> MSR_HWP_CAPABILITIES to set asymmetric CPU capacity information
> via the previously introduced arch_set_cpu_capacity() on hybrid
> systems without SMT.
> 
> Setting asymmetric CPU capacity is generally necessary to allow the
> scheduler to compute task sizes in a consistent way across all CPUs
> in a system where they differ by capacity.  That, in turn, should help
> to improve scheduling decisions.  It is also necessary for the schedutil
> cpufreq governor to operate as expected on hybrid systems where tasks
> migrate between CPUs of different capacities.
> 
> The underlying observation is that intel_pstate already uses
> MSR_HWP_CAPABILITIES to get CPU performance information which is
> exposed by it via sysfs and CPU performance scaling is based on it.
> Thus using this information for setting asymmetric CPU capacity is
> consistent with what the driver has been doing already.  Moreover,
> HWP_HIGHEST_PERF reflects the maximum capacity of a given CPU including
> both the instructions-per-cycle (IPC) factor and the maximum turbo
> frequency and the units in which that value is expressed are the same
> for all CPUs in the system, so the maximum capacity ratio between two
> CPUs can be obtained by computing the ratio of their HWP_HIGHEST_PERF
> values.  Of course, in principle that capacity ratio need not be
> directly applicable at lower frequencies, so using it for providing the
> asymmetric CPU capacity information to the scheduler is a rough
> approximation, but it is as good as it gets.  Also, measurements
> indicate that this approximation is not too bad in practice.
> 
> If the given system is hybrid and non-SMT, the new code disables ITMT
> support in the scheduler (because it may get in the way of asymmetric CPU
> capacity code in the scheduler that automatically gets enabled by setting
> asymmetric CPU capacity) after initializing all online CPUs and finds
> the one with the maximum HWP_HIGHEST_PERF value.  Next, it computes the
> capacity number for each (online) CPU by dividing the product of its
> HWP_HIGHEST_PERF and SCHED_CAPACITY_SCALE by the maximum HWP_HIGHEST_PERF.
> 
> When a CPU goes offline, its capacity is reset to SCHED_CAPACITY_SCALE
> and if it is the one with the maximum HWP_HIGHEST_PERF value, the
> capacity numbers for all of the other online CPUs are recomputed.  This
> also takes care of a cleanup during driver operation mode changes.
> 
> Analogously, when a new CPU goes online, its capacity number is updated
> and if its HWP_HIGHEST_PERF value is greater than the current maximum
> one, the capacity numbers for all of the other online CPUs are
> recomputed.
> 
> The case when the driver is notified of a CPU capacity change, either
> through the HWP interrupt or through an ACPI notification, is handled
> similarly to the CPU online case above, except that if the target CPU
> is the current highest-capacity one and its capacity is reduced, the
> capacity numbers for all of the other online CPUs need to be recomputed
> either.
> 
> If the driver's "no_trubo" sysfs attribute is updated, all of the CPU
> capacity information is computed from scratch to reflect the new turbo
> status.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

A few minor comments below...

FWIW,

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com> # scale invariance

[...]

> +
> +static void hybrid_init_cpu_scaling(void)

Maybe renaming hybrid_init_cpu_scaling() as hybrid_init_cpu_capacity_scaling(),
__hybrid_init_cpu_scaling() as __hybrid_init_cpu_capacity_scaling(), and
hybrid_update_cpu_scaling() as hybrid_update_cpu_capacity_scaling()?

It would make the code easier to read.

> +{
> +	bool disable_itmt = false;
> +
> +	mutex_lock(&hybrid_capacity_lock);
> +
> +	/*
> +	 * If hybrid_max_perf_cpu is set at this point, the hybrid CPU capacity
> +	 * scaling has been enabled already and the driver is just changing the
> +	 * operation mode.
> +	 */
> +	if (hybrid_max_perf_cpu) {
> +		__hybrid_init_cpu_scaling();
> +		goto unlock;
> +	}
> +
> +	/*
> +	 * On hybrid systems, use asym capacity instead of ITMT, but because
> +	 * the capacity of SMT threads is not deterministic even approximately,
> +	 * do not do that when SMT is in use.
> +	 */
> +	if (hwp_is_hybrid && !sched_smt_active() && arch_enable_hybrid_capacity_scale()) {
> +		__hybrid_init_cpu_scaling();
> +		disable_itmt = true;
> +	}
> +
> +unlock:
> +	mutex_unlock(&hybrid_capacity_lock);
> +
> +	if (disable_itmt)
> +		sched_clear_itmt_support();

It may be worth adding a comment here saying that the sched domains will
rebuilt to disable asym packing and enable asym capacity.