[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241028094618.GL9767@noisy.programming.kicks-ass.net>
Date: Mon, 28 Oct 2024 10:46:18 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: tglx@...utronix.de, linux-kernel@...r.kernel.org, mingo@...hat.com,
dvhart@...radead.org, dave@...olabs.net, andrealmeid@...lia.com,
Andrew Morton <akpm@...ux-foundation.org>, urezki@...il.com,
hch@...radead.org, lstoakes@...il.com,
Arnd Bergmann <arnd@...db.de>, linux-api@...r.kernel.org,
linux-mm@...ck.org, linux-arch@...r.kernel.org,
malteskarupke@....de, llong@...hat.com
Subject: Re: [PATCH 2/6] futex: Implement FUTEX2_NUMA
On Fri, Oct 25, 2024 at 12:28:54PM -0700, Christoph Lameter (Ampere) wrote:
> On Fri, 25 Oct 2024, Peter Zijlstra wrote:
>
> > Extend the futex2 interface to be numa aware.
> >
> > When FUTEX2_NUMA is specified for a futex, the user value is extended
> > to two words (of the same size). The first is the user value we all
> > know, the second one will be the node to place this futex on.
> >
> > struct futex_numa_32 {
> > u32 val;
> > u32 node;
> > };
> >
> > When node is set to ~0, WAIT will set it to the current node_id such
> > that WAKE knows where to find it. If userspace corrupts the node value
> > between WAIT and WAKE, the futex will not be found and no wakeup will
> > happen.
> >
> > When FUTEX2_NUMA is not set, the node is simply an extention of the
> > hash, such that traditional futexes are still interleaved over the
> > nodes.
>
>
> Would it be possible to follow the NUMA memory policy set up for a task
> when making these decisions? We may not need a separate FUTEX2_NUMA
> option. There are supportive functions in mm/mempolicy.c that will yield
> a node for the futex logic to use.
Using get_task_policy() seems very dangerous to me. It is explicitly
possible for different tasks in a process to have different policies,
which means (private) futexes would fail to work correctly.
We need something that is process wide consistent -- like the vma
policies. Except at current, those are to expensive to readily access.
Powered by blists - more mailing lists