linux-kernel - Re: [RFC PATCH 2/3] futex: Add basic infrastructure for local task local hash.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241030210819.GS9767@noisy.programming.kicks-ass.net>
Date: Wed, 30 Oct 2024 22:08:19 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	linux-kernel@...r.kernel.org,
	André Almeida <andrealmeid@...lia.com>,
	Darren Hart <dvhart@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: Re: [RFC PATCH 2/3] futex: Add basic infrastructure for local task
 local hash.

On Mon, Oct 28, 2024 at 01:02:34PM +0100, Thomas Gleixner wrote:

> That's what we did with the original series, but with this model it's
> daft. What we maybe could do there is:

Not sure what's daft -- a single JVM running on 400+ CPUs with 4
hashbuckets sounds awesome.

> 
> private_hash()
>    scoped_guard(rcu) {
>       hash = rcu_dereference(current->signal->futex_hash);

So I really do think mm_struct is a better place for this than signal
struct -- CLONE_SIGHAND is not mandatory when CLONE_VM.

I've long forgotten which JVM used the naked CLONE_VM, but there is some
creative code out there.

And futexes fundamentally live in the memory address space.

>       if (hash && rcuref_get(&hash->ref))
>          return hash;
>    }
> 
>    guard(spinlock_irq)(&task->sighand->siglock);
>    hash = current->signal->futex_hash;
>    if (hash && rcuref_get(&hash->ref))
>        return hash;
>    // Let alloc scale according to signal->nr_threads

  mm->mm_users

>    // alloc acquires a reference count
>    ....

It might make sense to have a prctl() setting that inhibits the hash
allocation entirely, reverting back to the global hash tables.

> And on fork do the following:
> 
>    scoped_guard(spinlock_irq, &task->sighand->siglock) {
>       hash = current->signal->futex_hash;
>       if (!hash || hash_size_ok())
>    	return hash;
> 
>       // Drop the initial reference, which forces the last
>       // user and subsequent new users into the respective
>       // slow paths, where they get stuck on sighand lock.
>       if (!rcuref_put(&hash->ref))
>         return;
> 
>       // rcuref_put() dropped the last reference
>       old_hash = realloc_hash(hash);
>       hash = current->signal->futex_hash;
>    }
>    kfree_rcu(old_hash);
>    return hash;
> 
> A similar logic is required when putting the last reference
> 
> futex_hash_put()
> {
>    if (!rcuref_put(&hash->ref))
>       return;
> 
>    scoped_guard(spinlock_irq, &task->sighand->siglock) {
>       // Fork might have raced with this
>       if (hash != current->signal->futex_hash)
>       	 return;
>       old_hash = realloc_hash(hash);
>    }
>    kfree_rcu(old_hash);  
> }

I'm not sure having that rehash under siglock is a fine idea. It's
convenient, no doubt, but urgh, could get expensive.

Another scheme would be to have 2 concurrent hash-tables for a little
while.