lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617071628.lXtqjG7C@linutronix.de>
Date: Tue, 17 Jun 2025 09:16:28 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Calvin Owens <calvin@...nvd.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
	"Lai, Yi" <yi1.lai@...ux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
 hash

On 2025-06-16 10:14:24 [-0700], Calvin Owens wrote:
> On Wednesday 06/11 at 14:39 -0000, tip-bot2 for Sebastian Andrzej Siewior wrote:
> > <snip> 
> > It is possible that two threads simultaneously request the global hash
> > and both pass the initial check and block later on the
> > mm::futex_hash_lock. In this case the first thread performs the switch
> > to the global hash. The second thread will also attempt to switch to the
> > global hash and while doing so, accessing the nonexisting slot 1 of the
> > struct futex_private_hash.
> 
> In case it's interesting to anyone, I'm hitting this one in real life,
> one of my build machines got stuck overnight:

The scenario described in the description is not something that happens
on its own. The bot explicitly "asked" for it. This won't happen in a
"normal" scenario where you do not explicitly ask for specific hash via
the prctl() interface.

> Jun 16 02:51:34 beethoven kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
> Jun 16 02:51:34 beethoven kernel: rcu:         16-....: (59997 ticks this GP) idle=eaf4/1/0x4000000000000000 softirq=14417247/14470115 fqs=21169
> Jun 16 02:51:34 beethoven kernel: rcu:         (t=60000 jiffies g=21453525 q=663214 ncpus=24)
> Jun 16 02:51:34 beethoven kernel: CPU: 16 UID: 1000 PID: 2028199 Comm: cargo Not tainted 6.16.0-rc1-lto-00236-g8c6bc74c7f89 #1 PREEMPT 
> Jun 16 02:51:34 beethoven kernel: Hardware name: ASRock B850 Pro-A/B850 Pro-A, BIOS 3.11 11/12/2024
> Jun 16 02:51:34 beethoven kernel: RIP: 0010:queued_spin_lock_slowpath+0x162/0x1d0
> Jun 16 02:51:34 beethoven kernel: Code: 0f 1f 84 00 00 00 00 00 f3 90 83 7a 08 00 74 f8 48 8b 32 48 85 f6 74 09 0f 0d 0e eb 0d 31 f6 eb 09 31 f6 eb 05 0f 1f 00 f3 90 <8b> 07 66 85 c0 75 f7 39 c8 75 13 41 b8 01 00 00 00 89 c8 f0 44 0f
…
> Jun 16 02:51:34 beethoven kernel: Call Trace:
> Jun 16 02:51:34 beethoven kernel:  <TASK>
> Jun 16 02:51:34 beethoven kernel:  __futex_pivot_hash+0x1f8/0x2e0
> Jun 16 02:51:34 beethoven kernel:  futex_hash+0x95/0xe0
> Jun 16 02:51:34 beethoven kernel:  futex_wait_setup+0x7e/0x230
> Jun 16 02:51:34 beethoven kernel:  __futex_wait+0x66/0x130
> Jun 16 02:51:34 beethoven kernel:  ? __futex_wake_mark+0xc0/0xc0
> Jun 16 02:51:34 beethoven kernel:  futex_wait+0xee/0x180
> Jun 16 02:51:34 beethoven kernel:  ? hrtimer_setup_sleeper_on_stack+0xe0/0xe0
> Jun 16 02:51:34 beethoven kernel:  do_futex+0x86/0x120
> Jun 16 02:51:34 beethoven kernel:  __se_sys_futex+0x16d/0x1e0
> Jun 16 02:51:34 beethoven kernel:  do_syscall_64+0x47/0x170
> Jun 16 02:51:34 beethoven kernel:  entry_SYSCALL_64_after_hwframe+0x4b/0x53
…
> <repeats forever until I wake up and kill the machine>
> 
> It seems like this is well understood already, but let me know if
> there's any debug info I can send that might be useful.

This is with LTO enabled.
Based on the backtrace: there was a resize request (probably because a
thread was created) and the resize was delayed because the hash was in
use. The hash was released and now this thread moves all enqueued users
from the old the hash to the new. RIP says it is a spin lock that it is
stuck on. This is either the new or the old hash bucket lock.
If this lifelocks then someone else must have it locked and not
released.
Is this the only thread stuck or is there more?
I'm puzzled here. It looks as if there was an unlock missing.

> Thanks,
> Calvin

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ