[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4a87269d-542e-4d4d-9c46-780f9eb55193@huawei.com>
Date: Tue, 1 Apr 2025 21:56:10 +0800
From: "zhenglifeng (A)" <zhenglifeng1@...wei.com>
To: Sumit Gupta <sumitg@...dia.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>
CC: <lenb@...nel.org>, <robert.moore@...el.com>, <corbet@....net>,
<linux-pm@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <acpica-devel@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <linux-tegra@...r.kernel.org>,
<treding@...dia.com>, <jonathanh@...dia.com>, <sashal@...dia.com>,
<vsethi@...dia.com>, <ksitaraman@...dia.com>, <sanjayc@...dia.com>,
<bbasu@...dia.com>
Subject: Re: [Patch 0/5] Support Autonomous Selection mode in cppc_cpufreq
Sorry for the delay.
On 2025/3/14 20:48, Sumit Gupta wrote:
>
>
>>>>
>>>> There seems to be some quite fundamental disagreement on how this
>>>> should be done, so I'm afraid I cannot do much about it ATM.
>>>>
>>>> Please agree on a common approach and come back to me when you are ready.
>>>>
>>>> Sending two concurrent patchsets under confusingly similar names again
>>>> and again isn't particularly helpful.
>>>>
>>>> Thanks!
>>>
>>> Hi Rafael,
>>>
>>> Thank you for looking into this.
>>>
>>> Hi Lifeng,
>>>
>>> As per the discussion, we can make the driver future extensible and
>>> also can optimize the register read/write access.
>>>
>>> I gave some thought and below is my proposal.
>>>
>>> 1) Pick 'Patch 1-7' from your patch series [1] which optimize API's
>>> to read/write a cpc register.
>>>
>>> 2) Pick my patches in [2]:
>>> - Patch 1-4: Keep all cpc registers together under acpi_cppc sysfs.
>>> Also, update existing API's to read/write regs in batch.
>>> - Patch 5: Creates 'cppc_cpufreq_epp_driver' instance for booting
>>> all CPU's in Auto mode and set registers with right values.
>>> They can be updated after boot from sysfs to change hints to HW.
>>> I can use the optimized API's from [1] where required in [2].
>>>
>>> Let me know if you are okay with this proposal.
>>> I can also send an updated patch series with all the patches combined?
>>>
>>> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
>>> [2] https://lore.kernel.org/lkml/20250211103737.447704-1-sumitg@nvidia.com/
>>>
>>> Regards,
>>> Sumit Gupta
>>>
>>
>> Hi Sumit,
>>
>> Over the past few days, I've been thinking about your proposal and
>> scenario.
>>
>> I think we both agree that PATCH 1-7 in [1] doesn't conflicts with [2], so
>> the rest of the discussion focuses on the differences between [2] and the
>> PATCH 8 in [1].
>>
>> We both tried to support autonomous selection mode in cppc_cpufreq but on
>> different ways. I think the differences between these two approaches can be
>> summarized into three questions:
>>
>> 1. Which sysfs files to expose? I think this is not a problem, we can keep
>> all of them.
>>
>> 2. Where to expose these sysfs files? I understand your willing to keep all
>> cpc registers together under acpi_cppc sysfs. But in my opinion, it is more
>> suitable to expose them under cppc_cpufreq_attr, for these reasons:
>>
>> 1) It may probably introduce concurrency and data consistency issues, as
>> I mentioned before.
>>
>
> As explained in previous reply, this will be solved with the ifdef
> check to enable the attributes for only those CPUFREQ drivers which want
> to use the generic nodes.
> e.g: '#ifdef CONFIG_ACPI_CPPC_CPUFREQ' for 'cppc_cpufreq'.
>
> These CPC register read/write sysfs nodes are generic as per the ACPI
> specification and without any vendor specific logic.
> So, adding them in the lib file 'cppc_acpi.c'(CONFIG_ACPI_CPPC_LIB) will
> avoid code duplication if a different or new ACPI based CPUFREQ driver
> also wants to use them just by adding their macro check. Such ifdef check is also used in other places for attributes creation like below.
> So, don't look like a problem.
> $ grep -A4 "acpi_cpufreq_attr\[" drivers/cpufreq/acpi-cpufreq.c
> static struct freq_attr *acpi_cpufreq_attr[] = {
> &freqdomain_cpus,
> #ifdef CONFIG_X86_ACPI_CPUFREQ_CPB
> &cpb,
> #endif
So in the future, we will see:
static struct attribute *cppc_attrs[] = {
...
#ifdef CONFIG_XXX
&xxx.attr,
&xxx.attr,
#endif
#ifdef CONFIG_XXX
&xxx.attr,
#endif
#ifdef CONFIG_XXX
&xxx.attr,
...
};
I think you are making things more complicated.
>
>> 2) The store functions call cpufreq_cpu_get() to get policy and update
>> the driver_data which is a cppc_cpudata. Only the driver_data in
>> cppc_cpufreq's policy is a cppc_cpudata! These operations are inappropriate
>> in cppc_acpi. This file currently provides interfaces for cpufreq drivers
>> to use. Reverse calls might mess up call relationships, break code
>> structures, and cause problems that are hard to pinpoint the root cause!
>>
>
> If we don't want to update the cpufreq policy from 'cppc_acpi.c' and only update it from within the cpufreq, then this could be one valid
> point to not add the write syfs nodes in 'cppc_acpi.c' lib file.
>
> @Rafael, @Viresh : Do you have any comments on this?
I think updating cpufreq policy from 'cppc_acpi.c' should be forbidden.
>
>> 3) Difficult to extend. Different cpufreq drivers may have different
>> processing logic when reading from and writing to these CPC registers.
>> Limiting all sysfs here makes it difficult for each cpufreq driver to
>> extend. I think this is why there are only read-only interfaces under
>> cppc_attrs before.
>>
>
> We are updating the CPC registers as per the generic ACPI specification.
> So, any ACPI based CPUFREQ driver can use these generic nodes to
> read/write reg's until they have a vendor specific requirement or
> implementation.
> As explained above, If someone wants to update in different way and use
> their own CPUFREQ driver then these generic attributes won't be created
> due to the CPUFREQ driver macro check.
> I think AMD and Intel are doing more than just reading/updating the registers. That's why they needed their driver specific implementations.
>
>> Adding a 'ifdef' is not a good way to solve these problems. Defining this
>> config does not necessarily mean that the cpufreq driver is cppc_cpufreq.
>>
>
> It means that only.
> ./drivers/cpufreq/Makefile:obj-$(CONFIG_ACPI_CPPC_CPUFREQ) += cppc_cpufreq.o
Compile this file does not mean that the cpufreq driver is cppc_cpufreq.
Driver registration may fail, and the actually loaded driver may be
another. It'll be dangerous to expose these sysfs files for users to update
registers' value in this case.
>
>> 3. Is it necessary to add a new driver instance? [1] exposed the sysfs
>> files to support users dynamically change the auto selection mode of each
>> policy. Each policy can be operated seperately. It seems to me that if you
>> want to boot all CPUs in auto mode, it should be sufficient to set all
>> relevant registers to the correct values at boot time. I can't see why the
>> new instance is necessary unless you explain it further. Could you explain
>> more about why you add a new instance starting from answer these questions:
>>
>> For a specific CPU, what is the difference between using the two instances
>> when auto_sel is 1? And what is the difference when auto_sel is 0?
>>
>
> Explained this in previous reply. Let me elaborate more.
>
> For hundred's of CPU's, we don't need to explicitly set multiple sysfs
> after boot to enable and configure Auto mode with right params. That's why an easy option is to pass boot argument or module param for enabling
> and configuration.
> A separate instance 'cppc_cpufreq_epp' of the 'cppc_cpufreq' driver is
> added because policy min/max need to be updated to the min/max_perf
> and not nominal/lowest nonlinear perf which is done by the default
> init hook. Min_perf value can be lower than lowest nonlinear perf and Max_perf can be higher than nominal perf.
> If some CPU is booted with epp instance and later the auto mode is disabled or min/max_perf is changed from sysfs then also the policy
> min/max need to be updated accordingly.
>
> Another is that in Autonomous mode the freq selection and setting is
> done by HW. So, cpufreq_driver->target() hook is not needed.
> These are few reasons which I am aware of as of now.
> I think in future there can be more. Having a separate instance
> reflecting a HW based Autonomous frequency selection will make it easy
> for any future changes.
So CPUs will act totally differently under these two instance. But what if
I want part of the CPUs in HW mode and others in SW mode? Should I boot on
HW mode and set some policies' auto_set to false or the other way? It seems
like the effects of theses two approaches are completely different. In my
opinion, this new instance is more like a completely different driver than
cppc_cpufreq.
>
>> If it turns out that the new instance is necessary, I think we can reach a
>> common approach by adding this new cpufreq driver instance and place the
>> attributes in 'cppc_cpufreq_epp_attr', like amd-pstate did.
>>
>> What do you think?
>
> I initially thought about this but there was a problem.
> What if we boot with non-epp instance which doesn't have these attributes and later want to enable Auto mode for few CPU's from sysfs.
That's the problem. CPUs can be set to Auto mode with or without this new
instance. So what's the point of it?
>
>
> Best Regards,
> Sumit Gupta
Powered by blists - more mailing lists