lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tencent_F59412B7B14221D391FE1C1ADBAF4EBD5F07@qq.com>
Date: Fri, 18 Apr 2025 09:26:49 +0800
From: Yaxiong Tian <iambestgod@...com>
To: Lukasz Luba <lukasz.luba@....com>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 Yaxiong Tian <tianyaxiong@...inos.cn>, rafael@...nel.org
Subject: Re: [PATCH v4] PM: EM: Fix potential division-by-zero error in
 em_compute_costs()



在 2025/4/17 21:27, Lukasz Luba 写道:
> 
> 
> On 4/17/25 08:43, Yaxiong Tian wrote:
>>
>>
>> 在 2025/4/17 13:57, Lukasz Luba 写道:
>>>
>>>
>>> On 4/17/25 02:07, Yaxiong Tian wrote:
>>>> From: Yaxiong Tian <tianyaxiong@...inos.cn>
>>>>
>>>> When the device is of a non-CPU type, table[i].performance won't be
>>>> initialized in the previous em_init_performance(), resulting in 
>>>> division
>>>> by zero when calculating costs in em_compute_costs().
>>>>
>>>> Since the 'cost' algorithm is only used for EAS energy efficiency
>>>> calculations and is currently not utilized by other device drivers, we
>>>> should add the _is_cpu_device(dev) check to prevent this 
>>>> division-by-zero
>>>> issue.
>>>>
>>>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove 
>>>> division")
>>>> Signed-off-by: Yaxiong Tian <tianyaxiong@...inos.cn>
>>>> ---
>>>>   kernel/power/energy_model.c | 4 ++++
>>>>   1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>>>> index d9b7e2b38c7a..41606247c277 100644
>>>> --- a/kernel/power/energy_model.c
>>>> +++ b/kernel/power/energy_model.c
>>>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, 
>>>> struct em_perf_state *table,
>>>>       unsigned long prev_cost = ULONG_MAX;
>>>>       int i, ret;
>>>> +    /* This is needed only for CPUs and EAS skip other devices */
>>>> +    if (!_is_cpu_device(dev))
>>>> +        return 0;
>>>> +
>>>>       /* Compute the cost of each performance state. */
>>>>       for (i = nr_states - 1; i >= 0; i--) {
>>>>           unsigned long power_res, cost;
>>>
>>>
>>> Please stop for a while. I have to check what happened that you
>>> faced the issue in the first place. I have been testing the GPU
>>> EMs and there was no issues...
>>>
>>> Let me debug that today.
>>
>> Of course. Since I don't have actual hardware, I can only logically
>> deduce that this issue might exist.
>>
>>
> 
> I have run with the GPU EM registered in the boot:
> 
> -------------------------------------------------------
> [    2.753333] panfrost ff9a0000.gpu: EM: created perf domain
> [    2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor 
> 0x0 status 0x0
> [    2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407, 
> issues: 00000000,24040400
> [    2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206 
> Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7
> [    2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 
> 150000000Hz, actual 148500000HZ div = 0)
> [    2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1
> 
> root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags
> 0x1
> root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/*
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790
> 
> --------------------------------------------------------
> 
> The EM for the GPU is not modified during the boot like the CPUs'
> EM are, thus this code is not triggered. Although, the API is
> open and in theory the GPU EM can be modified at runtime
> as well and it will reach that em_compute_costs() issue
> with 'performance' field having value 0.
> 
> So this v4 patch would be needed in this case.
> 
> Please re-send this v4 patch as a completely new message.
> 
> Thanks for looking at that code path and the fix for potential
> issue.
> 
> You can also add my:
> 
> Reviewed-by: Lukasz Luba <lukasz.luba@....com>
> 
> Regrds,
> Lukasz

Got it - patch resent with Reviewed-by.

https://lore.kernel.org/all/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ