lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520104247.S-gVcgxM@linutronix.de>
Date: Tue, 20 May 2025 12:42:47 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Alejandro Colomar <alx@...nel.org>
Cc: linux-man@...r.kernel.org, linux-kernel@...r.kernel.org,
	André Almeida <andrealmeid@...lia.com>,
	Darren Hart <dvhart@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: [PATCH v2] prctl: Add documentation for PR_FUTEX_HASH

The prctl(PR_FUTEX_HASH) is queued for the v6.16 merge window.
Add some documentation of the interface.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
v1…v2: https://lore.kernel.org/all/20250516161422.BqmdlxlF@linutronix.de/
  - Partly reword
  - Use "semantic newlines"

 man/man2/prctl.2                   |   3 +
 man/man2const/PR_FUTEX_HASH.2const | 122 +++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)
 create mode 100644 man/man2const/PR_FUTEX_HASH.2const

diff --git a/man/man2/prctl.2 b/man/man2/prctl.2
index f29b745b12578..a884064a40b7d 100644
--- a/man/man2/prctl.2
+++ b/man/man2/prctl.2
@@ -150,6 +150,8 @@ with a significance depending on the first one.
 .B PR_GET_MDWE
 .TQ
 .B PR_RISCV_SET_ICACHE_FLUSH_CTX
+.TQ
+.B PR_FUTEX_HASH
 .SH RETURN VALUE
 On success,
 a nonnegative value is returned.
@@ -262,4 +264,5 @@ so these operations should be used with care.
 .BR PR_SET_MDWE (2const),
 .BR PR_GET_MDWE (2const),
 .BR PR_RISCV_SET_ICACHE_FLUSH_CTX (2const),
+.BR PR_FUTEX_HASH (2const),
 .BR core (5)
diff --git a/man/man2const/PR_FUTEX_HASH.2const b/man/man2const/PR_FUTEX_HASH.2const
new file mode 100644
index 0000000000000..c7aa36064b79e
--- /dev/null
+++ b/man/man2const/PR_FUTEX_HASH.2const
@@ -0,0 +1,122 @@
+.\" Copyright, The authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH PR_FUTEX_HASH 2const (date) "Linux man-pages (unreleased)"
+.SH NAME
+PR_FUTEX_HASH
+\-
+configure the private futex hash
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
+.B #include <sys/prctl.h>
+.P
+.BI "int prctl(PR_FUTEX_HASH, unsigned long " op ", ...);"
+.fi
+.SH DESCRIPTION
+Configure the attributes for the underlying hash used by the
+.BR futex (2)
+family of operations.
+The Linux kernel uses a hash to distributes the
+.BR futex (2)
+users on different data structures.
+The data structure holds the in-kernel representation of the operation and
+keeps track of the current users which are enqueued and wait for a wake up.
+It also provides synchronisation with users who perform a wake up.
+The size of the global hash is determined at boot time and is based on the
+number of CPUs in the system.
+Since the mapping from the provided
+.I uaddr
+value to the in-kernel representation is based on a hash, two unrelated tasks
+in the system can share the same hash bucket.
+This in turn can lead to delays of the
+.BR futex (2)
+operation due to lock contention of the data structure.
+These delays can be problematic on a real-time system since random tasks can
+share in-kernel locks and it is not deterministic which tasks will be involved.
+.P
+Linux v6.16 implements a process wide private hash which is used by all
+.BR futex (2)
+operations which specify the
+.B FUTEX_PRIVATE_FLAG
+as part of the operation.
+Without any configuration the kernel will allocate 16 hash slots once the first
+thread has been created.
+If the process continues to create threads, the kernel will try to resize the
+private hash based on the number of threads and available CPUs in the system.
+The kernel will only increase the size and will make sure it does not exceed
+the size of the global hash.
+.P
+The user can configure the size of the private hash which will also disable the
+automatic resize provided by the kernel.
+.P
+The following values for
+.I op
+can be specified:
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, unsigned long " hash_size ", unsigned long " hash_flags ");
+Set the number of slots to use for the private hash.
+.P
+.RS
+.TP
+.I hash_size
+Specifies the size of private hash to allocate. Possible values are:
+.RS
+.TP
+.I 0
+Use the global hash.
+This is the behaviour used before v6.16.
+The operation can not be undone.
+.TP
+.I >0
+Specifies the number of slots to allocate.
+The value must be power of two and the lowest possible value is 2.
+The upper limit depends on the available memory in the system.
+Each slot requires 64bytes of memory.
+Kernels compiled with
+.I CONFIG_PROVE_LOCKING
+will consume more than that.
+.RE
+.TP
+.I hash_flags
+.RS
+The following flags can be specified:
+.TP
+.I FH_FLAG_IMMUTABLE
+The private hash can no longer be changed.
+By using an immutable privat hash the kernel can avoid some accounting for the
+data structure.
+This accounting is visible in benchmarks if many
+.BR futex (2)
+operations are invoked in parallel on different CPUs.
+.RE
+.RE
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);
+Returns the current size of the the private hash.
+A value of 0 means that a private hash has not been allocated and the global
+hash is in use.
+A value >0 specifies the size of the private hash.
+.TP
+.BI "int prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_IMMUTABLE);
+Return 1 if the hash has been made immutable and not be changed.
+Otherwise 0.
+.\"
+.SH RETURN VALUE
+On success,
+these calls return a nonnegative value.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 6.16.
+.SH SEE ALSO
+.BR prctl (2) ,
+.BR futex (2) ,
+.BR futex (7)
-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ