[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618160333.PdGB89yt@linutronix.de>
Date: Wed, 18 Jun 2025 18:03:33 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Calvin Owens <calvin@...nvd.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
"Lai, Yi" <yi1.lai@...ux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
hash
On 2025-06-17 09:11:06 [-0700], Calvin Owens wrote:
> Actually got an oops this time:
>
> Oops: general protection fault, probably for non-canonical address 0xfdd92c90843cf111: 0000 [#1] SMP
> CPU: 3 UID: 1000 PID: 323127 Comm: cargo Not tainted 6.16.0-rc2-lto-00024-g9afe652958c3 #1 PREEMPT
> Hardware name: ASRock B850 Pro-A/B850 Pro-A, BIOS 3.11 11/12/2024
> RIP: 0010:queued_spin_lock_slowpath+0x12a/0x1d0
…
> Call Trace:
> <TASK>
> futex_unqueue+0x2e/0x110
> __futex_wait+0xc5/0x130
> futex_wait+0xee/0x180
> do_futex+0x86/0x120
> __se_sys_futex+0x16d/0x1e0
> do_syscall_64+0x47/0x170
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f086e918779
The lock_ptr is pointing to invalid memory. It explodes within
queued_spin_lock_slowpath() which looks like decode_tail() returned a
wrong pointer/ offset.
futex_queue() adds a local futex_q to the list and its lock_ptr points
to the hb lock. Then we do schedule() and after the wakeup the lock_ptr
is NULL after a successful wake. Otherwise it still points to the
futex_hash_bucket::lock.
Since futex_unqueue() attempts to acquire the lock, then there was no
wakeup but a timeout or a signal that ended the wait. The lock_ptr can
change during resize.
During the resize futex_rehash_private() moves the futex_q members from
the old queue to the new one. The lock is accessed within RCU and the
lock_ptr value is compared against the old value after locking. That
means it is accessed either before the rehash moved it the new hash
bucket or afterwards.
I don't see how this pointer can become invalid. RCU protects against
cleanup and the pointer compare ensures that it is the "current"
pointer.
I've been looking at clang's assembly of futex_unqueue() and it looks
correct. And futex_rehash_private() iterates over all slots.
> This is a giant Yocto build, but the comm is always cargo, so hopefully
> I can run those bits in isolation and hit it more quickly.
If it still explodes without LTO, would you mind trying gcc?
> Thanks,
> Calvin
Sebastian
Powered by blists - more mailing lists