lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jKPDrpcqk2tG8UNba3z2JL5XTUJeWcatErKC+i7om2JQ@mail.gmail.com>
Date: Mon, 25 Nov 2024 11:40:12 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>, Sasha Levin <sashal@...nel.org>, 
	Lukasz Luba <lukasz.luba@....com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>, Linux PM <linux-pm@...r.kernel.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [GIT PULL] Thermal control updates for v6.13-rc1

Sasha, thanks for the report!

Linus, thanks for the analysis!

On Sun, Nov 24, 2024 at 7:40 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Sun, 24 Nov 2024 at 09:10, Sasha Levin <sashal@...nel.org> wrote:
> >
> > On Mon, Nov 18, 2024 at 11:23:46AM +0100, Rafael J. Wysocki wrote:
> > >Hi Linus,
> > >
> > >Please pull from the tag
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > thermal-6.13-rc1
> >
> > Hi Rafael,
> >
> > After merging this PR into linus-next, KernelCI started reporting boot
> > failures on a few platforms:
> >
> > [    6.921489] [00000000000000d0] user address but active_mm is swapper
>
> This is an odd arm64 way of saying "NULL pointer dereference in kernel".
>
> The NULL pointer is in the user space address range, so it
> superficially looks like it's trying to do a user access.
>
> People are more used to the x86 page fault error reporting (and
> honestly - they are a bit better worded in this case too).
>
> If I did the disassembly correctly, the code disassembles to
>
>         cbz     x7, 0x168
>         mov     x5, x7
>         stp     x6, x7, [x20, #24]
>         mov     w19, #0x0
>         ldr     x0, [x5, #208]!      <--- faulting instruction
>
> which does match that address 00000000000000d0 does match "x5+208",
> since x5 is NULL.
>
> Matching it up manually with my build (different config, different
> compiler, so different register allocation), it's this:
>
> // drivers/thermal/gov_power_allocator.c:527:   if (last_passive) {
>         cbz     x6, .L177       // last_passive,
> .L134:
> // drivers/thermal/gov_power_allocator.c:595:
> list_for_each_entry(instance, &td->thermal_instances, trip_node) {
>         mov     x5, x6  // _21, last_passive
> // drivers/thermal/gov_power_allocator.c:529:
> params->trip_max = last_passive;
>         stp     x0, x6, [x21, 24]       // first_passive, last_passive,
> // drivers/thermal/gov_power_allocator.c:593:   int ret = 0;
>         mov     w19, 0  // <retval>,
> // drivers/thermal/gov_power_allocator.c:595:
> list_for_each_entry(instance, &td->thermal_instances, trip_node) {
>         ldr     x0, [x5, 208]!  // __mptr, MEM[(const struct
> thermal_trip_desc *)prephitmp_29].thermal_instances.next
>
> so it looks like it is that
>
>         list_for_each_entry(instance, &td->thermal_instances, trip_node) {
>
> with the 'td' being NULL.
>
> The code seems to do that
>
>         const struct thermal_trip_desc *td =
> trip_to_trip_desc(params->trip_max);
>
> So apparently params->trip_max is NULL.
>
> That's where I stopped looking. It's almost certainly due to  commit
> 0dc23567c206 ("thermal: core: Move lists of thermal instances to trip
> descriptors") but I don't know the code.
>
> Over to Rafael and Lukasz,

You are right, the above commit introduced this issue and there is one
more hint on it in Sasha's call trace, which is
thermal_tripless_zone_device_register() used for registering the
thermal zone.  Of course, this means that there are no trip points in
that zone, so params->trip_max is NULL in check_power_actors() and it
needs to be explicitly checked against NULL.

Previously, the check was not needed because the list of thermal
instances was located in the thermal zone object, so walking it would
just produce no results.

IOW, something like the attached patch is needed.

Thanks, Rafael

View attachment "thermal-gov_power_allocator-fix-null-deref.patch" of type "text/x-patch" (1738 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ