lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <873456b5hq.ffs@tglx>
Date: Fri, 19 Dec 2025 21:07:13 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Florian Albertz <linux@...m.net>, mingo@...hat.com
Cc: linux-kernel@...r.kernel.org, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: PROBLEM: Kernel 6.17 newly deadlocks futex

On Fri, Dec 19 2025 at 11:02, Florian Albertz wrote:
> static int child(void *arg) {
>     // It is important this call to create a thread happens between
>     // the wait and wake calls.
>     //
>     // Due to the new behavior around `need_futex_hash_allocate_defaults`,
>     // the first clone which includes CLONE_THREAD (CLONE_VM is not enough)
>     // results in a change in how futex hashes are calculated.

The problem is not this one.

>     clone(noop, malloc(STACK_SIZE) + STACK_SIZE,
>             CLONE_VM | CLONE_SIGHAND | CLONE_THREAD, NULL, NULL, NULL);
>
>     // So this now works with another hash and therefore does not wake the main
>     // process.
>     *fut = 1;
>     syscall(SYS_futex, fut, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
>
>     return 0;
> }
>
> int main(int argc, char *argv[]) {
>     fut = calloc(1, sizeof(*fut));
>
>     // Now we create a new process sharing virtual memory but crucially without
>     // specifying CLONE_THREAD.

The problem is here because the condition for hash allocation is too
tight. The private hash is bound to the MM which shared with CLONE_VM,
so the clone has to install a private hash despite creating a process
and not a thread.

>     clone(child, malloc(STACK_SIZE) + STACK_SIZE, CLONE_VM, NULL, NULL, NULL);
>
>     // And now this futex wait never wakes from kernel 6.17 onwards.
>     syscall(SYS_futex, fut, FUTEX_WAIT_PRIVATE, 0, NULL, NULL, 0);
> }

The below should fix that. It's not completely correct because the
resulting hash sizing looks at current->signal->threads. As signal is
not shared each resulting process accounts for their own threads. Fixing
that needs some more thoughts.

Thanks,

        tglx
---
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1948,11 +1948,9 @@ static void rv_task_fork(struct task_str
 #define rv_task_fork(p) do {} while (0)
 #endif
 
-static bool need_futex_hash_allocate_default(u64 clone_flags)
+static inline bool need_futex_hash_allocate_default(u64 clone_flags)
 {
-	if ((clone_flags & (CLONE_THREAD | CLONE_VM)) != (CLONE_THREAD | CLONE_VM))
-		return false;
-	return true;
+	return !!(clone_flags & CLONE_VM);
 }
 
 /*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ