linux-kernel - Re: [BUG] hrtimer: null deref in hrtimer_next_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <878qf4dmko.ffs@tglx>
Date: Mon, 15 Dec 2025 12:13:59 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Olle Lögdahl <olle@...dahl.net>,
 "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>
Cc: "frederic@...nel.org" <frederic@...nel.org>, "anna-maria@...utronix.de"
 <anna-maria@...utronix.de>
Subject: Re: [BUG] hrtimer: null deref in hrtimer_next_event_without when
 entering idle

On Sat, Dec 13 2025 at 08:55, Olle Lögdahl wrote:
> I encountered a kernel panic with a null-pointer dereference in the
> hrtimer system on kernel 6.17.9-arch1-1 (x86_64) when entering idle. 
> The crash occurred in __hrtimer_next_event_base+0x4c.
>
> [137017.825435] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [137017.825450] #PF: supervisor read access in kernel mode
> [137017.825457] #PF: error_code(0x0000) - not-present page
> [137017.825464] PGD 1719cb067 P4D 1719cb067 PUD 17ca5a067 PMD 0 
> [137017.825483] Oops: Oops: 0000 [#1] SMP NOPTI
> [137017.825495] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: P           OE       6.17.9-arch1-1 #1 PREEMPT(full)  71adf6020e7d04ea315feaf360c679be0fb5cb04
> [137017.825510] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [137017.825516] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4207 12/08/2018
> [137017.825523] RIP: 0010:__hrtimer_next_event_base+0x4c/0xb0
> [137017.825538] Code: 0f bc c9 89 cd 48 8d 45 01 48 c1 e0 06 4c 01 e0 74 32 ba 01 00 00 00 48 8b 40 28 d3 e2 f7 d2 21 d3 49 39 c7 74 43 48 c1 e5 06 <48> 8b 50 18 49 2b 54 2c 78 4c 39 ea 7d 08 4d 85 ff 74 1f 49 89 d5
> [137017.825546] RSP: 0018:ffffffffa3603d90 EFLAGS: 00010056
> [137017.825555] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [137017.825562] RDX: 00000000fffffffe RSI: ffff8a6b0ee216d8 RDI: ffff8a6b0ee21100
> [137017.825569] RBP: 0000000000000000 R08: ffffffffa3603d78 R09: 0000000000000018
> [137017.825575] R10: 00000000ffffffff R11: 000000000000012d R12: ffff8a6b0ee21100
> [137017.825582] R13: 7fffffffffffffff R14: 071c71c71c71c71c R15: ffff8a6b0ee216d8
> [137017.825589] FS:  0000000000000000(0000) GS:ffff8a6b6ab09000(0000) knlGS:0000000000000000
> [137017.825596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [137017.825603] CR2: 0000000000000018 CR3: 000000026e066000 CR4: 00000000003506f0
> [137017.825610] Call Trace:
> [137017.825618]  <TASK>
> [137017.825631]  hrtimer_next_event_without+0x56/0x90
> [137017.825644]  tick_nohz_get_sleep_length+0x86/0xa0
> [137017.825659]  menu_select+0x391/0x680
> [137017.825677]  do_idle+0x18b/0x210
> [137017.825693]  cpu_startup_entry+0x29/0x30
> [137017.825704]  rest_init+0xcc/0xd0
> [137017.825718]  start_kernel+0x9a2/0x9b0
> [137017.825735]  x86_64_start_reservations+0x24/0x30
> [137017.825748]  x86_64_start_kernel+0xd1/0xe0
> [137017.825760]  common_startup_64+0x13e/0x141
> [137017.825783]  </TASK>
>
> Disassembling the code at RIP shows the faulting instruction is:
>   2a: 48 8b 50 18    mov rdx,QWORD PTR [rax+0x18]

This looks like reading hrtimer::_softexpires and the hrtimer pointer is
NULL.

> Looking at the preceding code, rax was loaded from another structure
> at offset 0x28:
>   17: 48 8b 40 28    mov rax,QWORD PTR [rax+0x28]

That's loading the next node from the clock base

That means the clock base is marked active but has no timer queued. I
have no idea how that can happen as all related operations are holding
the relevant base lock.

> I have not been able to reproduce this yet. I'd be interested in
> working on a fix if guidance can be provided on the root cause.

No idea how this can be chased down unless you have a halfways reliable
reproducer which reproduces without that (whatever it is) module loaded:

> [137017.825510] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE

Thanks,

        tglx