linux-kernel - futex performance regression from "futex: Allow automatic allocation of process wide futex hash"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com>
Date: Tue, 3 Jun 2025 15:00:43 -0400
From: Chris Mason <clm@...a.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: futex performance regression from "futex: Allow automatic allocation
 of process wide futex hash"

Hi everyone,

While testing Peter's latest scheduler patches against current Linus
git, I found a pretty big performance regression with schbench:

https://github.com/masoncl/schbench

The command line I was using:

schbench -L -m 4 -M auto -t 256 -n 0 -r 60 -s 0

Bisecting the problem I landed on commit:

commit 7c4f75a21f636486d2969d9b6680403ea8483539 (HEAD -> update)
Author: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Date:   Wed Apr 16 18:29:13 2025 +0200

futex: Allow automatic allocation of process wide futex hash

Allocate a private futex hash with 16 slots if a task forks its first
thread.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link:
https://lore.kernel.org/r/20250416162921.513656-14bigeasy@linutronix.de

schbench uses one futex per thread, and the command line ends up
allocating 1024 threads, so the default bucket size used by this commit
is just too small.  Using 2048 buckets makes the problem go away.

On my big turin system, this commit slows down RPS by 36%.  But even a
VM on a skylake machine sees a 29% difference.

schbench is a microbenchmark, so grain of salt on all of this, but I
think our defaults are probably too low.

-chris