lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 Jun 2024 15:26:48 +0100
From: Ionela Voinescu <ionela.voinescu@....com>
To: "liwei (JK)" <liwei728@...wei.com>
Cc: Viresh Kumar <viresh.kumar@...aro.org>,
	Beata Michalska <beata.michalska@....com>,
	Vanshidhar Konda <vanshikonda@...amperecomputing.com>,
	rafael@...nel.org, al.stone@...aro.org, ashwin.chaugule@...aro.org,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
	liwei391@...wei.com, liaoyu15@...wei.com
Subject: Re: [PATCH] cpufreq/cppc: changing highest_perf to nominal_perf in
 cppc_cpufreq_cpu_init()

Hi,

On Friday 10 May 2024 at 11:06:50 (+0800), liwei (JK) wrote:
> Hello,
> 
> Thanks for for your reply.
> 
> Maybe my description has caused you some misunderstandings, please allow me
> to supplement the description
> 
> 在 2024/5/7 18:25, Ionela Voinescu 写道:
> > Hi,
> > 
> > Thanks for adding me to this.
> > 
> > On Monday 29 Apr 2024 at 16:19:45 (+0530), Viresh Kumar wrote:
> > > CC'ing few folks who are working with the driver.
> > > 
> > > On 28-04-24, 17:28, liwei wrote:
> > > > When turning on turbo, if frequency configuration takes effect slowly,
> > > > the updated policy->cur may be equal to the frequency configured in
> > > > governor->limits(), performance governor will not adjust the frequency,
> > > > configured frequency will remain at turbo-freq.
> > > > 
> > > > Simplified call stack looks as follows:
> > > > cpufreq_register_driver(&cppc_cpufreq_driver)
> > > > 	...
> > > > 	cppc_cpufreq_cpu_init()
> > > > 		cppc_get_perf_caps()
> > > > 		policy->max = cppc_perf_to_khz(caps, caps->nominal_perf)
> > > > 			cppc_set_perf(highest_perf) // set highest_perf
> > > > 			policy->cur = cpufreq_driver->get() // if cur == policy->max
> > > > 	cpufreq_init_policy()
> > > > 		...
> > > > 		cpufreq_start_governor() // governor: performance
> > > > 			new_freq = cpufreq_driver->get() // if new_freq == policy->max
> > > > 			if (policy->cur != new_freq)
> > > > 			cpufreq_out_of_sync(policy, new_freq)
> > > > 				...
> > > > 				policy->cur = new_freq
> > I believe the problem is here   ^^^^^^^^^^^^^^^^^^^^^^.
> > 
> > cpufreq_verify_current_freq() should not update policy->cur unless a
> > request to change frequency has actually reached the driver. I believe
> > policy->cur should always reflect the request, not the actual current
> > frequency of the CPU.
> > 
> > Given that new_freq is the current (hardware) frequency of the CPU,
> > obtained via .get(), it can be the nominal frequency, as it is in your
> > case, or any frequency, if there is any firmware/hardware capping in
> > place.
> > 
> > This causes the issue in your scenario, in which __cpufreq_driver_target()
> > filters the request from the governor as it finds it equal to policy->cur,
> > and it believes it's already set by hardware.
> > 
> > This causes another issue in which scaling_cur_freq, which for some
> > systems returns policy->cur, ends up returning the hardware frequency of
> > the CPUs, and not the last frequency request, as it should:
> > 
> > "scaling_cur_freq
> > Current frequency of all of the CPUs belonging to this policy (in kHz).
> > 
> > In the majority of cases, this is the frequency of the last P-state
> > requested by the scaling driver from the hardware using the scaling
> > interface provided by it, which may or may not reflect the frequency
> > the CPU is actually running at (due to hardware design and other
> > limitations)." [1]
> > 
> > Therefore policy->cur gets polluted with the hardware frequency of the
> > CPU sampled at that one time, and this affects governor decisions, as
> > in your case, and scaling_cur_freq feedback as well. This bad value will
> > not change until there's another .target() or cpufreq_out_of_sync()
> > call, which will never happen for fixed frequency governors like the
> > performance governor.
> > 
> > Thanks,
> > Ionela.
> > 
> 
> In the above function calling process, the frequency is obtained twice. The
> first time is in cpufreq_online(), and the second time is in
> cpufreq_verify_current_freq().
> 
> When the frequency configuration takes effect slowly, the kernel cannot
> sense when the frequency configuration takes effect. It may take effect
> before the frequency is read twice, between the frequencies read twice, or
> after the frequency is read twice.
> 
> |------------------|--------------------|---------------------|
> set highest_freq  get()               get()                target()
> 
> If it takes effect before two read operations, there will be no problem.
> 
> If it takes effect between two read operations, policy->cur will be updated
> in cpufreq_verify_current_freq(), the execution path is as follows:
> new_freq = cpufreq_driver->get() //  new_freq = turbo_freq
> 	if (policy->cur != new_freq)
> 		cpufreq_out_of_sync(policy, new_freq)
> 			...
> 			policy->cur = new_freq // cur = turbo_freq
> ...
> __cpufreq_driver_target(policy->max)
> 	cppc_set_perf(target) // policy->cur!=target
> 
> Reconfigure frequency to policy->max.
> 
> If policy->cur is not set to turbo_freq after two read operations,
> policy->cur will not be updated in cpufreq_verify_current_freq(), the
> execution path is as follows:
> new_freq = cpufreq_driver->get() //  new_freq == policy->cur
> 	if (policy->cur != new_freq)
> ...
> __cpufreq_driver_target(policy->max)
> 	ret // policy->cur==target
> 
> Configured frequency will remain at turbo-freq.
> 
> When reading scaling_cur_freq, the frequency value that may be read is
> policy->cur. If arch does not implement arch_freq_get_on_cpu(), and the
> registered cpufreq_driver does not define setpolicy()/get(), the frequency
> will not be obtained through the get() and will directly feed back
> policy->cur. If the above problem occurs, no exception will be detected when
> reading scaling_cur_freq. But reading cpuinfo_cur_freq will reacquire the
> frequency through the get() interface and feedback the newly acquired
> frequency value.

Thank you for the details. I did understand the problem, but I believe
the underlying cause is cpufreq_out_of_sync() setting policy->cur to the
current frequency and not keeping the value of the last frequency
request.

@Viresh, do you happen to know the reason behind this?

There are multiple issues caused by this, detailed at [1] (your patch),
[2] (the other issue described by me above), and more recently [3].

I agree that your code is a good fix for [1] and [3] is a fix for both
[2] and [3], if I'm not mistaken, but to me these are "tweaks" that
bypass the fundamental issue in the cpufreq core and I would not be
surprised to see other issues in the future caused by this, and not
covered by the fixes at [1] and [3].

This being said, I would like to see these issues fixed, even by [1] and
[3], if fixing the underlying cause is not feasible (or at least not
easy to evaluate).

[1] https://lore.kernel.org/lkml/20240428092852.1588188-1-liwei728@huawei.com/
[2] https://lore.kernel.org/lkml/3e6077bb-907c-057f-0896-d0a5814a4229@nvidia.com/
[3] https://lore.kernel.org/lkml/TYCP286MB2486B1D734F8E2D74BFBEEB1B1F32@TYCP286MB2486.JPNP286.PROD.OUTLOOK.COM/

Hope it helps,
Ionela.

> 
> Thanks
> liwei
> 
> > 
> > [1] https://docs.kernel.org/admin-guide/pm/cpufreq.html
> > 
> > > > 			...
> > > > 			policy->governor->limits()
> > > > 				__cpufreq_driver_target(policy->max)
> > > > 					if (policy->cur==target)
> > > > 					// generate error, keep set highest_perf
> > > > 						ret
> > > > 					cppc_set_perf(target)
> > > > 
> > > > Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init().
> > > > 
> > > > Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC")
> > > > Signed-off-by: liwei <liwei728@...wei.com>
> > > > ---
> > > >   drivers/cpufreq/cppc_cpufreq.c | 8 ++++----
> > > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > > > index 64420d9cfd1e..db04a82b8a97 100644
> > > > --- a/drivers/cpufreq/cppc_cpufreq.c
> > > > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > > > @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > > >   	if (caps->highest_perf > caps->nominal_perf)
> > > >   		boost_supported = true;
> > > > -	/* Set policy->cur to max now. The governors will adjust later. */
> > > > -	policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
> > > > -	cpu_data->perf_ctrls.desired_perf =  caps->highest_perf;
> > > > +	/* Set policy->cur to norm now. */
> > > > +	policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf);
> > > > +	cpu_data->perf_ctrls.desired_perf =  caps->nominal_perf;
> > > >   	ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> > > >   	if (ret) {
> > > >   		pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> > > > -			 caps->highest_perf, cpu, ret);
> > > > +			 caps->nominal_perf, cpu, ret);
> > > >   		goto out;
> > > >   	}
> > > > -- 
> > > > 2.25.1
> > > 
> > > -- 
> > > viresh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ