[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wje48UTJPEpriFX9A-XjfRUiqeDyEbUEMBnsCJ_KQK-rA@mail.gmail.com>
Date: Sun, 24 Nov 2024 10:39:42 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Sasha Levin <sashal@...nel.org>, Lukasz Luba <lukasz.luba@....com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Cc: Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [GIT PULL] Thermal control updates for v6.13-rc1
On Sun, 24 Nov 2024 at 09:10, Sasha Levin <sashal@...nel.org> wrote:
>
> On Mon, Nov 18, 2024 at 11:23:46AM +0100, Rafael J. Wysocki wrote:
> >Hi Linus,
> >
> >Please pull from the tag
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > thermal-6.13-rc1
>
> Hi Rafael,
>
> After merging this PR into linus-next, KernelCI started reporting boot
> failures on a few platforms:
>
> [ 6.921489] [00000000000000d0] user address but active_mm is swapper
This is an odd arm64 way of saying "NULL pointer dereference in kernel".
The NULL pointer is in the user space address range, so it
superficially looks like it's trying to do a user access.
People are more used to the x86 page fault error reporting (and
honestly - they are a bit better worded in this case too).
If I did the disassembly correctly, the code disassembles to
cbz x7, 0x168
mov x5, x7
stp x6, x7, [x20, #24]
mov w19, #0x0
ldr x0, [x5, #208]! <--- faulting instruction
which does match that address 00000000000000d0 does match "x5+208",
since x5 is NULL.
Matching it up manually with my build (different config, different
compiler, so different register allocation), it's this:
// drivers/thermal/gov_power_allocator.c:527: if (last_passive) {
cbz x6, .L177 // last_passive,
.L134:
// drivers/thermal/gov_power_allocator.c:595:
list_for_each_entry(instance, &td->thermal_instances, trip_node) {
mov x5, x6 // _21, last_passive
// drivers/thermal/gov_power_allocator.c:529:
params->trip_max = last_passive;
stp x0, x6, [x21, 24] // first_passive, last_passive,
// drivers/thermal/gov_power_allocator.c:593: int ret = 0;
mov w19, 0 // <retval>,
// drivers/thermal/gov_power_allocator.c:595:
list_for_each_entry(instance, &td->thermal_instances, trip_node) {
ldr x0, [x5, 208]! // __mptr, MEM[(const struct
thermal_trip_desc *)prephitmp_29].thermal_instances.next
so it looks like it is that
list_for_each_entry(instance, &td->thermal_instances, trip_node) {
with the 'td' being NULL.
The code seems to do that
const struct thermal_trip_desc *td =
trip_to_trip_desc(params->trip_max);
So apparently params->trip_max is NULL.
That's where I stopped looking. It's almost certainly due to commit
0dc23567c206 ("thermal: core: Move lists of thermal instances to trip
descriptors") but I don't know the code.
Over to Rafael and Lukasz,
Linus
Powered by blists - more mailing lists