lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241218111618.268028-1-bigeasy@linutronix.de>
Date: Wed, 18 Dec 2024 12:09:38 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>,
	Darren Hart <dvhart@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>,
	Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: [PATCH v6 00/15] futex: Add support task local hash maps.

Hi,

this is a follow up on
	https://lore.kernel.org/ZwVOMgBMxrw7BU9A@jlelli-thinkpadt14gen4.remote.csb

and adds support for task local futex_hash_bucket. It can be created via
prctl().

This version supports resize at runtime, auto resize while creating
threads. The upper limit is at 256 * num_possible_cpus() but I guess we
can lower that.

I posted performance numbers of "perf bench futex hash"
	https://lore.kernel.org/all/20241101110810.R3AnEqdu@linutronix.de/

While the performance of the 16 default bucket look worse than the 512
(after that the performance hardly changes while before that doubles) be
aware those are now task local (and not shared with others) and it seems
to be sufficient in general.
For the systems with 512CPUs and one db application we can have the
resize. So either the application needs to resize it or we offer auto
resize based on threads and CPUs. But be aware that workloads like
"xz huge_file.tar" will happily acquire all CPUs in the system and only
use a few locks in total and not very often. So it would probably
perform with two hash buckets as good as 512 in this scenario.

v5…v5: https://lore.kernel.org/all/20241215230642.104118-1-bigeasy@linutronix.de/
  - Let only futex_hash() perform the delayed assignment of the new
    local hash.
  - Make sure that futex_hash_allocate() does not drop the initial
    reference of the current local hash more than once.
  - Split "futex_hb_waiters_dec() before unlock" into its own patch.
  - Reword the commit description in a few patches as suggested by
    Thomas Gleixner.

v4…v5: https://lore.kernel.org/all/20241203164335.1125381-1-bigeasy@linutronix.de/
  - Changed the the reference-tracking scheme: The reference is now
    dropped once the lock is dropped. The resize operation also requeues
    all users on the hash bucket from the old one to the new one.

v3…v4: https://lore.kernel.org/all/20241115172035.795842-1-bigeasy@linutronix.de/
  - Completed resize. Tested with wait/wake, lock_pi, requeue and
    requeue_pi.
  - Added auto resize during thread creation.
  - Fixed bucket initialisation of the global hash bucket resilting in a
    crash sometimes.

v2…v3 https://lore.kernel.org/all/20241028121921.1264150-1-bigeasy@linutronix.de/
  - The default auto size for auto creation is 16.
  - For the private hash jhash2 is used and only for the address.
  - My "perf bench futex hash" hacks have been added.
  - The structure moved from signal's struct to mm.
  - It is possible resize it at runtime.

v1…v2 https://lore.kernel.org/all/20241026224306.982896-1-bigeasy@linutronix.de/:
  - Moved to struct signal_struct and is used process wide.
  - Automaticly allocated once the first thread is created.

Sebastian


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ