[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250303105416.GY11590@noisy.programming.kicks-ass.net>
Date: Mon, 3 Mar 2025 11:54:16 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org,
André Almeida <andrealmeid@...lia.com>,
Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <longman@...hat.com>
Subject: Re: [PATCH v9 00/11] futex: Add support task local hash maps.
On Tue, Feb 25, 2025 at 06:09:03PM +0100, Sebastian Andrzej Siewior wrote:
> Sebastian Andrzej Siewior (11):
> futex: fixup futex_wait_setup [fold futex: Move futex_queue() into
> futex_wait_setup()]
> futex: Create helper function to initialize a hash slot.
> futex: Add basic infrastructure for local task local hash.
> futex: Hash only the address for private futexes.
> futex: Allow automatic allocation of process wide futex hash.
> futex: Decrease the waiter count before the unlock operation.
> futex: Introduce futex_q_lockptr_lock().
> futex: Acquire a hash reference in futex_wait_multiple_setup().
> futex: Allow to re-allocate the private local hash.
> futex: Resize local futex hash table based on number of threads.
Right, I've been going over this and been poking at the patches for the
past few days, and I'm not quite sure where to start.
There's a bunch of simple things, that can be trivially fixed, but
there's also some more fundamental things.
I've written a pile of patches on top of this while playing around with
things. The latest pile sits in:
queue/locking/futex
I'm not sure I should post the patches as a reply to this email (I can,
if people want), but let me try and summarize what I did and why.
Primarily, the reason I started poking at it is that I think the prctl()
as implemented is completely useless. Notably its effect is entirely
ephemeral, one pthread_create() call can re-size the hash, destroying
the user requested size. Also, I still feel one should be able to set
the hash size to 0 and have it revert to global hash.
Finally prctl() should not return until the rehash is complete.
I think my implementation now does all that -- but I've not tested it
yet -- I've to write a prctl() testcase and it was too nice outside :-)
So, on the way to reworking the prctl(), I ran into:
- naming; hb_p is a terrible name, the way I read that is
hash-bucket-private, or hash-bucket pointer, neither make much sense,
because they're a pointer to struct futex_private_hash, which is a
hash-table.
I've very uninspired done s/hb_p/fph/g with the exception of
hb->hb_p, which is now hb->priv.
- more naming; you had:
hb = __futex_hash(key);
futex_hash_get(hb);
futex_hash_put(hb);
fph = futex_get_private_hash();
futex_put_private_hash();
which is all sorts of inconsistent, and I've made that:
hb = __futex_hash(key); /* hash, no get */
hb = futex_hash(key) /* hash and get */
futex_hash_get(hb); /* get */
futex_hash_put(hb); /* put */
fph = futex_private_hash();
futex_private_hash_get(fph);
futex_private_hash_put(fph);
- There was some superfluous state; notably, AFAICT
futex_private_hash::{initial_ref_dropped,released} are unneeded and
made the code unnecessarily complicated.
You can drop the initial ref when phash && !phash_new, eg on the
first time around when you allocate a new hash-table.
We don't need to track released because we can simply check for that
state using rcuref_read() == 0.
- As alluded to in a previous point, there was no means of only
hashing, the fph get was both non-obviously hidden inside the private
hash and unconditional. Untangled that.
My current prctl() thing does:
- reject !power-of-two and 1
- accepts 0
- returns once rehash is done
Notably, having done a prctl() disables the auto-sizing.
When allocating a new private hash table and there is already one
pending, it compares the tables. The compare function checks in order:
- custom (user provided / prctl())
- zero size
- biggest size
IOW, any user requested size always wins, a 0 size is final otherwise
go with the largest.
After that I rebased my FUTEX2_NUMA patch on top of all this and added
a new FUTEX2_MPOL, which is something Christoph Lameter asked for a
while back, and something we can now actually do sanely, since we have
lockless vma lookups working.
Anyway, the entire stack builds and boots, but is otherwise very much
untested.
WDYT?
Powered by blists - more mailing lists