linux-kernel - Re: [PATCH v1 11/14] futex: Implement FUTEX2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <887eadb6-6142-3edf-0a25-d33b2219b90d@gentwo.org>
Date: Fri, 25 Oct 2024 12:36:28 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Peter Zijlstra <peterz@...radead.org>
cc: tglx@...utronix.de, axboe@...nel.dk, linux-kernel@...r.kernel.org, 
    mingo@...hat.com, dvhart@...radead.org, dave@...olabs.net, 
    andrealmeid@...lia.com, Andrew Morton <akpm@...ux-foundation.org>, 
    urezki@...il.com, hch@...radead.org, lstoakes@...il.com, 
    Arnd Bergmann <arnd@...db.de>, linux-api@...r.kernel.org, 
    linux-mm@...ck.org, linux-arch@...r.kernel.org, malteskarupke@....de
Subject: Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA


Sorry saw this after the other email.

On Fri, 25 Oct 2024, Peter Zijlstra wrote:

> > Could we follow NUMA policies like with other metadata allocations during
> > systen call processing?
>
> I had a quick look at this, and since the mempolicy stuff is per vma,
> and we don't have the vma, this is going to be terribly expensive --
> mmap_lock and all that.

There is a memory policy for the task as a whole that is used for slab
allocations and allocations that are not vma bound in current->mempolicy.
Use that.

> Using memory policies is probably okay -- but still risky, since you get
> the extra failure case where if you change the mempolicy between WAIT
> and WAKE things will not match and sadness happens, but that *SHOULD*
> hopefully not happen a lot. Mempolicies are typically fairly static.

Right.

> > That way the placement of the futex can be controlled by the tasks memory
> > policy. We could skip the FUTEX2_NUMA option.
>
> That doesn't work. If we don't have storage for the node across
> WAIT/WAKE, then the node must be deterministic per futex_hash().
> Otherwise wake has no chance of finding the entry.

You can get a node number following the current task mempolicy by calling
mempolicy_slab_node() and keep using that node for the future.

It is also possible to check if the policy is interleave and then follow
the distributed hash scheme.


> The current scheme where we determine node based on hash bits is fully
> deterministic and WAIT/WAKE will agree on which node-hash to use. The
> interleave is no worse than the global hash today -- OTOH it also isn't
> better.

This is unexpected strange behavior for those familiar with NUMA. We have
tools to set memory policies for tasks and those policies should be used
throughout.