[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516161422.BqmdlxlF@linutronix.de>
Date: Fri, 16 May 2025 18:14:22 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Alejandro Colomar <alx@...nel.org>
Cc: linux-man@...r.kernel.org, linux-kernel@...r.kernel.org,
André Almeida <andrealmeid@...lia.com>,
Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <longman@...hat.com>
Subject: [PATCH] prctl: Add documentation for PR_FUTEX_HASH
The prctl(PR_FUTEX_HASH) is queued for the v6.16 merge window.
Add some documentation of the interface.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
man/man2/prctl.2 | 3 +
man/man2const/PR_FUTEX_HASH.2const | 112 +++++++++++++++++++++++++++++
2 files changed, 115 insertions(+)
create mode 100644 man/man2const/PR_FUTEX_HASH.2const
diff --git a/man/man2/prctl.2 b/man/man2/prctl.2
index 7a6b73e25e7a8..30c868d051a0c 100644
--- a/man/man2/prctl.2
+++ b/man/man2/prctl.2
@@ -150,6 +150,8 @@ with a significance depending on the first one.
.B PR_GET_MDWE
.TQ
.B PR_RISCV_SET_ICACHE_FLUSH_CTX
+.TQ
+.B PR_FUTEX_HASH
.SH RETURN VALUE
On success,
a nonnegative value is returned.
@@ -262,4 +264,5 @@ so these operations should be used with care.
.BR PR_SET_MDWE (2const),
.BR PR_GET_MDWE (2const),
.BR PR_RISCV_SET_ICACHE_FLUSH_CTX (2const),
+.BR PR_FUTEX_HASH (2const),
.BR core (5)
diff --git a/man/man2const/PR_FUTEX_HASH.2const b/man/man2const/PR_FUTEX_HASH.2const
new file mode 100644
index 0000000000000..c6a6396729770
--- /dev/null
+++ b/man/man2const/PR_FUTEX_HASH.2const
@@ -0,0 +1,112 @@
+.\" Copyright, The contributors to the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH PR_FUTEX_HASH 2const (date) "Linux man-pages (unreleased)"
+.SH NAME
+PR_FUTEX_HASH
+\-
+configure the private futex hash
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/prctl.h>" " /* Definition of " PR_* " constants */"
+.B #include <sys/prctl.h>
+.P
+.BI "int prctl(PR_FUTEX_HASH, long " op ", ...);"
+.fi
+.SH DESCRIPTION
+Configure the attributes for the underlying hash used by the
+.BR futex (2)
+family of operations. The Linux kernel uses a hash to distributes the
+.BR futex (2)
+users on different data structures. The data structure holds the in-kernel
+representation of the operation and keeps track of the current users which are
+enqueued and wait for a wake up and those who perform a wake up. The size of
+the global hash is determined at boot time and is based on the number of CPUs
+in the system. Since the mapping from the provided
+.I uaddr
+value to the in-kernel representation is based on a hash, two unrelated tasks
+in the system can share the same hash bucket. This in turn can lead to delays
+of the due
+.BR futex (2)
+operation due to to lock contention of the data structure. These delays can be
+problematic on a PREEMPT_RT system since random tasks can share in-kernel locks
+and it is not deterministic which tasks will be involved.
+.P
+Linux v6.16 implements a process wide private hash which is used by all
+.BR futex (2)
+operations which specify the
+.B FUTEX_PRIVATE_FLAG
+as part of the operation.
+Without any configuration the kernel will allocate 16 hash slots once the first
+thread has been created. If the process continues to create threads, the kernel
+will try to resize the private hash based on the number of threads and
+available CPUs in the system. The kernel will only increase the size and will
+make sure it does not exceed the size of the global hash.
+.P
+The user can configure the size of the private hash which will also disable the
+automatic resize provided by the kernel.
+.P
+The following values for
+.I op
+can be specified:
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, " hash_size ", " hash_flags ");
+Set the number of slots to use for the private hash.
+.P
+.RS
+.TP
+.I hash_size
+Specifies the size of private hash to allocate. Possible values are:
+.RS
+.TP
+.I 0
+Use the global hash. This is the behaviour used before v6.16. The operation can
+not be undone.
+.TP
+.I >0
+Specifies the number of slots to allocate. The value must be power of two and
+lowest possible value is 2. The upper limit depends on available memory in
+the system. Each slot requires 64bytes of memory. Kernels compiled with
+.I CONFIG_PROVE_LOCKING
+will consume more than that.
+.RE
+.TP
+.I hash_flags
+.RS
+The following flags can be specified:
+.TP
+.I FH_FLAG_IMMUTABLE
+The private hash can no longer be changed. By using an immutable privat hash
+the kernel can avoid some accounting for the data structure. This accounting
+is visible in benchmarks if many
+.BR futex (2)
+operations are invoked in parallel on different CPUs.
+.RE
+.RE
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);
+Returns the current size of the the private hash. A value of 0 means that a
+private has not been allocated or the global hash is used. A value >0 specifies
+the size of the private hash.
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_IMMUTABLE);
+Return 1 if the hash has been made immutable and not be changed. Otherwise 0.
+
+.SH RETURN VALUE
+On success,
+these calls return a value >=0.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 6.16.
+.SH SEE ALSO
+.BR prctl (2) ,
+.BR futex (2) ,
+.BR futex (7)
--
2.49.0
Powered by blists - more mailing lists