[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a7f6279-c21c-4d6f-a5ee-48766b1ca34e@arm.com>
Date: Thu, 17 Apr 2025 14:27:54 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: Yaxiong Tian <iambestgod@...com>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
Yaxiong Tian <tianyaxiong@...inos.cn>, rafael@...nel.org
Subject: Re: [PATCH v4] PM: EM: Fix potential division-by-zero error in
em_compute_costs()
On 4/17/25 08:43, Yaxiong Tian wrote:
>
>
> 在 2025/4/17 13:57, Lukasz Luba 写道:
>>
>>
>> On 4/17/25 02:07, Yaxiong Tian wrote:
>>> From: Yaxiong Tian <tianyaxiong@...inos.cn>
>>>
>>> When the device is of a non-CPU type, table[i].performance won't be
>>> initialized in the previous em_init_performance(), resulting in division
>>> by zero when calculating costs in em_compute_costs().
>>>
>>> Since the 'cost' algorithm is only used for EAS energy efficiency
>>> calculations and is currently not utilized by other device drivers, we
>>> should add the _is_cpu_device(dev) check to prevent this
>>> division-by-zero
>>> issue.
>>>
>>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove
>>> division")
>>> Signed-off-by: Yaxiong Tian <tianyaxiong@...inos.cn>
>>> ---
>>> kernel/power/energy_model.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>>> index d9b7e2b38c7a..41606247c277 100644
>>> --- a/kernel/power/energy_model.c
>>> +++ b/kernel/power/energy_model.c
>>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev,
>>> struct em_perf_state *table,
>>> unsigned long prev_cost = ULONG_MAX;
>>> int i, ret;
>>> + /* This is needed only for CPUs and EAS skip other devices */
>>> + if (!_is_cpu_device(dev))
>>> + return 0;
>>> +
>>> /* Compute the cost of each performance state. */
>>> for (i = nr_states - 1; i >= 0; i--) {
>>> unsigned long power_res, cost;
>>
>>
>> Please stop for a while. I have to check what happened that you
>> faced the issue in the first place. I have been testing the GPU
>> EMs and there was no issues...
>>
>> Let me debug that today.
>
> Of course. Since I don't have actual hardware, I can only logically
> deduce that this issue might exist.
>
>
I have run with the GPU EM registered in the boot:
-------------------------------------------------------
[ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain
[ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor
0x0 status 0x0
[ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407,
issues: 00000000,24040400
[ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206
Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7
[ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req
150000000Hz, actual 148500000HZ div = 0)
[ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1
root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags
0x1
root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/*
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790
--------------------------------------------------------
The EM for the GPU is not modified during the boot like the CPUs'
EM are, thus this code is not triggered. Although, the API is
open and in theory the GPU EM can be modified at runtime
as well and it will reach that em_compute_costs() issue
with 'performance' field having value 0.
So this v4 patch would be needed in this case.
Please re-send this v4 patch as a completely new message.
Thanks for looking at that code path and the fix for potential
issue.
You can also add my:
Reviewed-by: Lukasz Luba <lukasz.luba@....com>
Regrds,
Lukasz
Powered by blists - more mailing lists