[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81a5c15e-8cbb-0a90-f6ec-2ed63af1cfd6@arm.com>
Date: Thu, 11 Aug 2022 08:29:38 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: Jeremy Linton <jeremy.linton@....com>
Cc: rafael@...nel.org, lenb@...nel.org, viresh.kumar@...aro.org,
robert.moore@...el.com, devel@...ica.org,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org, vschneid@...hat.com,
Ionela Voinescu <ionela.voinescu@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>
Subject: Re: [PATCH v2 1/1] ACPI: CPPC: Disable FIE if registers in PCC
regions
On 8/10/22 19:04, Jeremy Linton wrote:
> Hi,
>
> On 8/10/22 09:32, Lukasz Luba wrote:
>>
>>
>> On 8/10/22 15:08, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 8/10/22 07:29, Lukasz Luba wrote:
>>>> Hi Jeremy,
>>>>
>>>> +CC Valentin since he might be interested in this finding
>>>> +CC Ionela, Dietmar
>>>>
>>>> I have a few comments for this patch.
>>>>
>>>>
>>>> On 7/28/22 23:10, Jeremy Linton wrote:
>>>>> PCC regions utilize a mailbox to set/retrieve register values used by
>>>>> the CPPC code. This is fine as long as the operations are
>>>>> infrequent. With the FIE code enabled though the overhead can range
>>>>> from 2-11% of system CPU overhead (ex: as measured by top) on Arm
>>>>> based machines.
>>>>>
>>>>> So, before enabling FIE assure none of the registers used by
>>>>> cppc_get_perf_ctrs() are in the PCC region. Furthermore lets also
>>>>> enable a module parameter which can also disable it at boot or module
>>>>> reload.
>>>>>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@....com>
>>>>> ---
>>>>> drivers/acpi/cppc_acpi.c | 41
>>>>> ++++++++++++++++++++++++++++++++++
>>>>> drivers/cpufreq/cppc_cpufreq.c | 19 ++++++++++++----
>>>>> include/acpi/cppc_acpi.h | 5 +++++
>>>>> 3 files changed, 61 insertions(+), 4 deletions(-)
>>>>
>>>>
>>>> 1. You assume that all platforms would have this big overhead when
>>>> they have the PCC regions for this purpose.
>>>> Do we know which version of HW mailbox have been implemented
>>>> and used that have this 2-11% overhead in a platform?
>>>> Do also more recent MHU have such issues, so we could block
>>>> them by default (like in your code)?
>>>
>>> Well, the mailbox nature of PCC pretty much assures its "slow",
>>> relative the alternative of providing an actual register. If a
>>> platform provides direct access to say MHU registers, then of course
>>> they won't actually be in a PCC region and the FIE will remain on.
>>>
>>>
>>>>
>>>> 2. I would prefer to simply change the default Kconfig value to 'n' for
>>>> the ACPI_CPPC_CPUFREQ_FIE, instead of creating a runtime
>>>> check code which disables it.
>>>> We have probably introduce this overhead for older platforms with
>>>> this commit:
>>>
>>> The problem here is that these ACPI kernels are being shipped as
>>> single images in distro's which expect them to run on a wide range of
>>> platforms (including x86/amd in this case), and preform optimally on
>>> all of them.
>>>
>>> So the 'n' option basically is saying that the latest FIE code
>>> doesn't provide a befit anywhere?
>>
>> How we define the 'benefit' here - it's a better task utilization.
>> How much better it would be vs. previous approach with old-style FIE?
>>
>> TBH, I haven't found any test results from the development of the patch
>> set. Maybe someone could point me to the test results which bring
>> this benefit of better utilization.
>>
>> In the RFC I could find that statement [1]:
>>
>> "This is tested with some hacks, as I didn't have access to the right
>> hardware, on the ARM64 hikey board to check the overall functionality
>> and that works fine."
>>
>> There should be a rule that such code is tested on a real server with
>> many CPUs under some stress-test.
>>
>> Ionela do you have some test results where this new FIE feature
>> introduces some better & meaningful accuracy improvement to the
>> tasks utilization?
>>
>> With this overhead measured on a real server platform I think
>> it's not worth to keep it 'y' in default.
>>
>> The design is heavy, as stated in the commit message:
>> " On an invocation of cppc_scale_freq_tick(), we schedule an irq work
>> (since we reach here from hard-irq context), which then schedules a
>> normal work item and cppc_scale_freq_workfn() updates the per_cpu
>> arch_freq_scale variable based on the counter updates since the last
>> tick.
>> "
>>
>> As you said Jeremy, this mailbox would always be with overhead. IMO
>> untill we cannot be sure we have some powerful new HW mailbox, this
>> feature should be disabled.
>
>
> Right, the design of the feature would be completely different if it
> were a simple register read to get the delivered perf avoiding all the
> jumping around you quoted.
>
> Which sorta implies that its not really fixable as is, which IMHO means
> that 'n' isn't really strong enough, it should probably be under
> CONFIG_EXPERT as well if such a change were made to discourage its use.
>
That's something that I also started to consider, since we are aware of
the impact.
You have my vote when you decide to go forward with that config change.
Powered by blists - more mailing lists