[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFGTmn_CFkuTbP4i@mozart.vkv.me>
Date: Tue, 17 Jun 2025 09:11:06 -0700
From: Calvin Owens <calvin@...nvd.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
"Lai, Yi" <yi1.lai@...ux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
hash
On Tuesday 06/17 at 11:50 +0200, Sebastian Andrzej Siewior wrote:
> On 2025-06-17 02:23:08 [-0700], Calvin Owens wrote:
> > Ugh, I'm sorry, I was in too much of a hurry this morning... cargo is
> > obviously not calling PR_FUTEX_HASH which is new in 6.16 :/
> No worries.
>
> > > This is with LTO enabled.
> >
> > Full lto with llvm-20.1.7.
> >
> …
> > Nothing showed up in the logs but the RCU stalls on CPU16, always in
> > queued_spin_lock_slowpath().
> >
> > I'll run the build it was doing when it happened in a loop overnight and
> > see if I can trigger it again.
Actually got an oops this time:
Oops: general protection fault, probably for non-canonical address 0xfdd92c90843cf111: 0000 [#1] SMP
CPU: 3 UID: 1000 PID: 323127 Comm: cargo Not tainted 6.16.0-rc2-lto-00024-g9afe652958c3 #1 PREEMPT
Hardware name: ASRock B850 Pro-A/B850 Pro-A, BIOS 3.11 11/12/2024
RIP: 0010:queued_spin_lock_slowpath+0x12a/0x1d0
Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 48 0f b7 c0 49 c7 c0 f8 ff ff ff 89 c6 c1 ee 02 83 e0 03 49 8b b4 f0 00 21 67 83 c1 e0 04 <48> 89 94 30 00 f1 4a 84 83 7a 08 00 75 10 0f 1f 84 00 00 00 00 00
RSP: 0018:ffffc9002c953d20 EFLAGS: 00010256
RAX: 0000000000000000 RBX: ffff88814e78be40 RCX: 0000000000100000
RDX: ffff88901fce5100 RSI: fdd92c90fff20011 RDI: ffff8881c2ae9384
RBP: 000000000000002b R08: fffffffffffffff8 R09: 00000000002ab900
R10: 000000000000b823 R11: 0000000000000c00 R12: ffff88814e78be40
R13: ffffc9002c953d48 R14: ffffc9002c953d48 R15: ffff8881c2ae9384
FS: 00007f086efb6600(0000) GS:ffff88909b836000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ced9c42650 CR3: 000000034b88e000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
futex_unqueue+0x2e/0x110
__futex_wait+0xc5/0x130
? __futex_wake_mark+0xc0/0xc0
futex_wait+0xee/0x180
? hrtimer_setup_sleeper_on_stack+0xe0/0xe0
do_futex+0x86/0x120
__se_sys_futex+0x16d/0x1e0
? __x64_sys_write+0xba/0xc0
do_syscall_64+0x47/0x170
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f086e918779
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc5815f678 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: ffffffffffffffda RBX: 00007f086e918760 RCX: 00007f086e918779
RDX: 000000000000002b RSI: 0000000000000089 RDI: 00005636f9fb60d0
RBP: 00007ffc5815f6d0 R08: 0000000000000000 R09: 00007ffcffffffff
R10: 00007ffc5815f690 R11: 0000000000000246 R12: 000000001dcd6401
R13: 00007f086e833fd0 R14: 00005636f9fb60d0 R15: 000000000000002b
</TASK>
---[ end trace 0000000000000000 ]---
RIP: 0010:queued_spin_lock_slowpath+0x12a/0x1d0
Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 48 0f b7 c0 49 c7 c0 f8 ff ff ff 89 c6 c1 ee 02 83 e0 03 49 8b b4 f0 00 21 67 83 c1 e0 04 <48> 89 94 30 00 f1 4a 84 83 7a 08 00 75 10 0f 1f 84 00 00 00 00 00
RSP: 0018:ffffc9002c953d20 EFLAGS: 00010256
RAX: 0000000000000000 RBX: ffff88814e78be40 RCX: 0000000000100000
RDX: ffff88901fce5100 RSI: fdd92c90fff20011 RDI: ffff8881c2ae9384
RBP: 000000000000002b R08: fffffffffffffff8 R09: 00000000002ab900
R10: 000000000000b823 R11: 0000000000000c00 R12: ffff88814e78be40
R13: ffffc9002c953d48 R14: ffffc9002c953d48 R15: ffff8881c2ae9384
FS: 00007f086efb6600(0000) GS:ffff88909b836000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ced9c42650 CR3: 000000034b88e000 CR4: 0000000000750ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception ]---
This is a giant Yocto build, but the comm is always cargo, so hopefully
I can run those bits in isolation and hit it more quickly.
> Please check if you can reproduce it and if so if it also happens
> without lto.
> I have no idea why one spinlock_t remains locked. It is either locked or
> some stray memory.
> Oh. Lockdep adds quite some overhead but it should complain that a
> spinlock_t is still locked while returning to userland.
I'll report back when I've tried :)
I'll also try some of the mm debug configs.
Thanks,
Calvin
Powered by blists - more mailing lists