lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5eefbda4-5135-c445-1b9d-a35dc2bec9e1@phoronix.com>
Date:   Mon, 15 Feb 2021 19:49:17 -0600
From:   Michael Larabel <Michael@...ronix.com>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>,
        Giovanni Gherdovich <ggherdovich@...e.cz>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [RFT][PATCH v1] cpufreq: ACPI: Set cpuinfo.max_freq directly if
 max boost is known


On 2/15/21 1:24 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> Commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover
> boost frequencies") attempted to address a performance issue involving
> acpi-cpufreq, the schedutil governor and scale-invariance on x86 by
> extending the frequency tables created by acpi-cpufreq to cover the
> entire range of "turbo" (or "boost") frequencies, but that caused
> frequencies reported via /proc/cpuinfo and the scaling_cur_freq
> attribute in sysfs to change which may confuse users and monitoring
> tools.
>
> For this reason, revert the part of commit 3c55e94c0ade adding the
> extra entry to the frequency table and use the observation that
> in principle cpuinfo.max_freq need not be equal to the maximum
> frequency listed in the frequency table for the given policy.
>
> Namely, modify cpufreq_frequency_table_cpuinfo() to allow cpufreq
> drivers to set their own cpuinfo.max_freq above that frequency and
> change  acpi-cpufreq to set cpuinfo.max_freq to the maximum boost
> frequency found via CPPC.
>
> This should be sufficient to let all of the cpufreq subsystem know
> the real maximum frequency of the CPU without changing frequency
> reporting.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=211305
> Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
> Reported-by: Matt McDonald <gardotd426@...il.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>
> Michael, Giovanni,
>
> The fix for the EPYC performance regression that was merged into 5.11 introduced
> an undesirable side-effect by distorting the CPU frequency reporting via
> /proc/cpuinfo and scaling_cur_freq (see the BZ link above for details).
>
> The patch below is reported to address this problem and it should still allow
> schedutil to achieve desirable performance, because it simply sets
> cpuinfo.max_freq without extending the frequency table of the CPU.
>
> Please test this one and let me know if it adversely affects performance.
>
> Thanks!


When carrying out tests so far today on an EPYC 7F72 2P and Ryzen 9 
5900X with workloads seeing impact from the prior patches, everything is 
looking good when comparing v5.11 to v5.11 + this patch. Not seeing any 
measurable difference on either of those systems as a result of this patch.

Running some additional tests and on a few more boxes that should wrap 
up tomorrow but at least so far the patch isn't showing any measurable 
changes to performance.

Michael


>
> ---
>   drivers/cpufreq/acpi-cpufreq.c |   62 ++++++++++-------------------------------
>   drivers/cpufreq/freq_table.c   |    8 ++++-
>   2 files changed, 23 insertions(+), 47 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
> +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
> @@ -54,7 +54,6 @@ struct acpi_cpufreq_data {
>   	unsigned int resume;
>   	unsigned int cpu_feature;
>   	unsigned int acpi_perf_cpu;
> -	unsigned int first_perf_state;
>   	cpumask_var_t freqdomain_cpus;
>   	void (*cpu_freq_write)(struct acpi_pct_register *reg, u32 val);
>   	u32 (*cpu_freq_read)(struct acpi_pct_register *reg);
> @@ -223,10 +222,10 @@ static unsigned extract_msr(struct cpufr
>   
>   	perf = to_perf_data(data);
>   
> -	cpufreq_for_each_entry(pos, policy->freq_table + data->first_perf_state)
> +	cpufreq_for_each_entry(pos, policy->freq_table)
>   		if (msr == perf->states[pos->driver_data].status)
>   			return pos->frequency;
> -	return policy->freq_table[data->first_perf_state].frequency;
> +	return policy->freq_table[0].frequency;
>   }
>   
>   static unsigned extract_freq(struct cpufreq_policy *policy, u32 val)
> @@ -365,7 +364,6 @@ static unsigned int get_cur_freq_on_cpu(
>   	struct cpufreq_policy *policy;
>   	unsigned int freq;
>   	unsigned int cached_freq;
> -	unsigned int state;
>   
>   	pr_debug("%s (%d)\n", __func__, cpu);
>   
> @@ -377,11 +375,7 @@ static unsigned int get_cur_freq_on_cpu(
>   	if (unlikely(!data || !policy->freq_table))
>   		return 0;
>   
> -	state = to_perf_data(data)->state;
> -	if (state < data->first_perf_state)
> -		state = data->first_perf_state;
> -
> -	cached_freq = policy->freq_table[state].frequency;
> +	cached_freq = policy->freq_table[to_perf_data(data)->state].frequency;
>   	freq = extract_freq(policy, get_cur_val(cpumask_of(cpu), data));
>   	if (freq != cached_freq) {
>   		/*
> @@ -680,7 +674,6 @@ static int acpi_cpufreq_cpu_init(struct
>   	struct cpuinfo_x86 *c = &cpu_data(cpu);
>   	unsigned int valid_states = 0;
>   	unsigned int result = 0;
> -	unsigned int state_count;
>   	u64 max_boost_ratio;
>   	unsigned int i;
>   #ifdef CONFIG_SMP
> @@ -795,28 +788,8 @@ static int acpi_cpufreq_cpu_init(struct
>   		goto err_unreg;
>   	}
>   
> -	state_count = perf->state_count + 1;
> -
> -	max_boost_ratio = get_max_boost_ratio(cpu);
> -	if (max_boost_ratio) {
> -		/*
> -		 * Make a room for one more entry to represent the highest
> -		 * available "boost" frequency.
> -		 */
> -		state_count++;
> -		valid_states++;
> -		data->first_perf_state = valid_states;
> -	} else {
> -		/*
> -		 * If the maximum "boost" frequency is unknown, ask the arch
> -		 * scale-invariance code to use the "nominal" performance for
> -		 * CPU utilization scaling so as to prevent the schedutil
> -		 * governor from selecting inadequate CPU frequencies.
> -		 */
> -		arch_set_max_freq_ratio(true);
> -	}
> -
> -	freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL);
> +	freq_table = kcalloc(perf->state_count + 1, sizeof(*freq_table),
> +			     GFP_KERNEL);
>   	if (!freq_table) {
>   		result = -ENOMEM;
>   		goto err_unreg;
> @@ -851,27 +824,25 @@ static int acpi_cpufreq_cpu_init(struct
>   	}
>   	freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
>   
> +	max_boost_ratio = get_max_boost_ratio(cpu);
>   	if (max_boost_ratio) {
> -		unsigned int state = data->first_perf_state;
> -		unsigned int freq = freq_table[state].frequency;
> +		unsigned int freq = freq_table[0].frequency;
>   
>   		/*
>   		 * Because the loop above sorts the freq_table entries in the
>   		 * descending order, freq is the maximum frequency in the table.
>   		 * Assume that it corresponds to the CPPC nominal frequency and
> -		 * use it to populate the frequency field of the extra "boost"
> -		 * frequency entry.
> +		 * use it to set cpuinfo.max_freq.
>   		 */
> -		freq_table[0].frequency = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT;
> +		policy->cpuinfo.max_freq = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT;
> +	} else {
>   		/*
> -		 * The purpose of the extra "boost" frequency entry is to make
> -		 * the rest of cpufreq aware of the real maximum frequency, but
> -		 * the way to request it is the same as for the first_perf_state
> -		 * entry that is expected to cover the entire range of "boost"
> -		 * frequencies of the CPU, so copy the driver_data value from
> -		 * that entry.
> +		 * If the maximum "boost" frequency is unknown, ask the arch
> +		 * scale-invariance code to use the "nominal" performance for
> +		 * CPU utilization scaling so as to prevent the schedutil
> +		 * governor from selecting inadequate CPU frequencies.
>   		 */
> -		freq_table[0].driver_data = freq_table[state].driver_data;
> +		arch_set_max_freq_ratio(true);
>   	}
>   
>   	policy->freq_table = freq_table;
> @@ -947,8 +918,7 @@ static void acpi_cpufreq_cpu_ready(struc
>   {
>   	struct acpi_processor_performance *perf = per_cpu_ptr(acpi_perf_data,
>   							      policy->cpu);
> -	struct acpi_cpufreq_data *data = policy->driver_data;
> -	unsigned int freq = policy->freq_table[data->first_perf_state].frequency;
> +	unsigned int freq = policy->freq_table[0].frequency;
>   
>   	if (perf->states[0].core_frequency * 1000 != freq)
>   		pr_warn(FW_WARN "P-state 0 is not max freq\n");
> Index: linux-pm/drivers/cpufreq/freq_table.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/freq_table.c
> +++ linux-pm/drivers/cpufreq/freq_table.c
> @@ -52,7 +52,13 @@ int cpufreq_frequency_table_cpuinfo(stru
>   	}
>   
>   	policy->min = policy->cpuinfo.min_freq = min_freq;
> -	policy->max = policy->cpuinfo.max_freq = max_freq;
> +	policy->max = max_freq;
> +	/*
> +	 * If the driver has set its own cpuinfo.max_freq above max_freq, leave
> +	 * it as is.
> +	 */
> +	if (policy->cpuinfo.max_freq < max_freq)
> +		policy->max = policy->cpuinfo.max_freq = max_freq;
>   
>   	if (policy->min == ~0)
>   		return -EINVAL;
>
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ