[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b7028264-167a-410a-af0c-a9aa156ca993@arm.com>
Date: Tue, 9 Jul 2024 10:49:03 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
Cc: Nícolas F. R. A. Prado <nfraprado@...labora.com>,
Daniel Lezcano <daniel.lezcano@...aro.org>, Zhang Rui <rui.zhang@...el.com>,
Nikita Travkin <nikita@...n.ru>, kernel@...labora.com,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] thermal: gov_power_allocator: Return early in manage if
trip_max is null
On 7/3/24 21:20, Rafael J. Wysocki wrote:
> On Wed, Jul 3, 2024 at 11:03 AM AngeloGioacchino Del Regno
> <angelogioacchino.delregno@...labora.com> wrote:
>>
>> Il 02/07/24 23:24, Nícolas F. R. A. Prado ha scritto:
>>> Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding
>>> without trip points") allowed the governor to bind even when trip_max
>>> is null. This allows a null pointer dereference to happen in the manage
>>> callback. Add an early return to prevent it, since the governor is
>>> expected to not do anything in this case.
>>>
>>> Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points")
>>> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@...labora.com>
>>> ---
>>> This issue was noticed by KernelCI during a boot test on the
>>> mt8195-cherry-tomato-r2 platform with the config in [1]. The stack trace
>>> is attached below.
>>>
>>> [1] http://0x0.st/XaON.txt
>>>
>>> [ 4.015786] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>>> [ 4.015791] Mem abort info:
>>> [ 4.015793] ESR = 0x0000000096000004
>>> [ 4.015796] EC = 0x25: DABT (current EL), IL = 32 bits
>>> [ 4.015799] SET = 0, FnV = 0
>>> [ 4.015802] EA = 0, S1PTW = 0
>>> [ 4.015804] FSC = 0x04: level 0 translation fault
>>> [ 4.015807] Data abort info:
>>> [ 4.015809] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>> [ 4.015811] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>> [ 4.015814] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> [ 4.015818] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109809000
>>> [ 4.015821] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>>> [ 4.015835] Modules linked in: mt8195_mt6359(+) mt6577_auxadc snd_soc_mt8195_afe mtk_scp_ipi snd_sof_utils mtk_wdt(+)
>>> [ 4.015852] CPU: 2 PID: 13 Comm: kworker/u32:1 Not tainted 6.10.0-rc6 #1 c5d519ae8e7fec6bbe67cb8c50bfebcb89dfa54e
>>> [ 4.015859] Hardware name: Acer Tomato (rev2) board (DT)
>>> [ 4.015862] Workqueue: events_unbound deferred_probe_work_func
>>> [ 4.015875] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [ 4.015880] pc : power_allocator_manage+0x110/0x6a0
>>> [ 4.015888] lr : __thermal_zone_device_update+0x1dc/0x400
>>> [ 4.015893] sp : ffff8000800eb800
>>> [ 4.015895] x29: ffff8000800eb810 x28: 0000000000000001 x27: 0000000000000001
>>> [ 4.015903] x26: aaaaaaaaaaaaaaab x25: ffff07a0461c15a0 x24: ffffb58530ca67c0
>>> [ 4.015911] x23: 0000000000000000 x22: ffff07a04098fcc0 x21: ffffb58532eec848
>>> [ 4.015918] x20: ffff8000800eb920 x19: ffff07a0461c1000 x18: 0000000000000b4b
>>> [ 4.015926] x17: 5359534255530031 x16: ffffb585310352e4 x15: 0000000000000020
>>> [ 4.015933] x14: 0000000000000000 x13: ffffffff00000000 x12: 0000000000000040
>>> [ 4.015940] x11: 0101010101010101 x10: ffffffffffffffff x9 : ffffb58530ca8d78
>>> [ 4.015948] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000001388
>>> [ 4.015955] x5 : 0000000000000000 x4 : 0000000000000384 x3 : 0000000000000000
>>> [ 4.015962] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
>>> [ 4.015970] Call trace:
>>> [ 4.015972] power_allocator_manage+0x110/0x6a0
>>> [ 4.015978] __thermal_zone_device_update+0x1dc/0x400
>>> [ 4.015983] thermal_zone_device_set_mode+0x7c/0xa0
>>> [ 4.015987] thermal_zone_device_enable+0x1c/0x28
>>> [ 4.015991] thermal_of_zone_register+0x43c/0x498
>>> [ 4.015996] devm_thermal_of_zone_register+0x6c/0xb8
>>> [ 4.016001] gadc_thermal_probe+0x140/0x214
>>> [ 4.016007] platform_probe+0x70/0xc4
>>> [ 4.016012] really_probe+0x140/0x270
>>> [ 4.016018] __driver_probe_device+0xfc/0x114
>>> [ 4.016024] driver_probe_device+0x44/0x100
>>> [ 4.016029] __device_attach_driver+0x64/0xdc
>>> [ 4.016035] bus_for_each_drv+0xb4/0xdc
>>> [ 4.016041] __device_attach+0xdc/0x16c
>>> [ 4.016046] device_initial_probe+0x1c/0x28
>>> [ 4.016052] bus_probe_device+0x44/0xac
>>> [ 4.016057] deferred_probe_work_func+0xb0/0xc4
>>> [ 4.016063] process_scheduled_works+0x114/0x330
>>> [ 4.016070] worker_thread+0x1c0/0x20c
>>> [ 4.016076] kthread+0xf8/0x108
>>> [ 4.016081] ret_from_fork+0x10/0x20
>>> [ 4.016090] Code: d1030294 17ffffdd f94012c0 f9401ed7 (b9400000)
>>> [ 4.016095] ---[ end trace 0000000000000000 ]---
>>> ---
>>> drivers/thermal/gov_power_allocator.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
>>> index 45f04a25255a..1b2345a697c5 100644
>>> --- a/drivers/thermal/gov_power_allocator.c
>>> +++ b/drivers/thermal/gov_power_allocator.c
>>> @@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz)
>>> return;
>>> }
>>>
>>> + if (!params->trip_max)
>>> + return;
>>> +
>>
>> I'm not sure that this is the right thing to do.
>>
>> If you do that, allocate_power() will never be called, so the entire algo doesn't
>> work, making binding this completely useless (as it's going to be a noop..!).
>>
>> Check what get_governor_trips() says in the documentation:
>>
>> * If there is only one trip point, then that's considered to be the
>> * "maximum desired temperature" trip point and the governor is always
>> * on. If there are no passive or active trip points, then the
>> * governor won't do anything. In fact, its throttle function
>> * won't be called at all.
>>
>> ....and it looks like you're aware of that, as you said that in the commit
>> description as well.
>
> IIUC, the problematic commit allowed the power allocator governor to
> bind to a tripless thermal zone in order to prevent failing the entire
> thermal zone registration.
>
> Yes, it will be a noop in this case because in the absence of any
> trips there will be nothing to do for it. Still, user space can check
> the zone temperature via sysfs.
>
> Adding a NULL pointer check before the place where the pointer in
> question is dereferenced is not a bad idea at all.
Yes, I agree. My apologies for being late.
Thanks Rafael for applying the patch.
Lukasz
Powered by blists - more mailing lists