lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Jul 2022 14:01:04 +0300
From:   Andrey Semashev <andrey.semashev@...il.com>
To:     André Almeida <andrealmeid@...lia.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        linux-kernel@...r.kernel.org
Cc:     linux-api@...r.kernel.org, fweimer@...hat.com,
        libc-alpha@...rceware.org, Davidlohr Bueso <dave@...olabs.net>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [RFC] futex2: add NUMA awareness

On 7/14/22 06:18, André Almeida wrote:
> Hi,
> 
> futex2 is an ongoing project with the goal to create a new interface for
> futex that solves ongoing issues with the current syscall.
> 
> One of this problems is the lack of NUMA awareness for futex operations.
> This RFC is aimed to gather feedback around the a NUMA interface proposal.
> 
>  * The problem
> 
> futex has a single, global hash table to store information of current
> waiters to be queried by wakers. This hash table is stored in a single
> node in non-uniform machines. This means that a process running in other
> nodes will have some overhead using futex, given that it will need to
> access the table in a different node.
> 
>  * A solution
> 
> For NUMA machines, it would be allocated a table per node. Processes
> then would be able to use the local table to avoid sharing data with
> other nodes.
> 
>  * The interface
> 
> Userspace needs to specify which node would like to use to store/query
> the futex table. The common case would be to operate on the current
> node, but some cases could required to operate in another one.
> 
> Before getting to the NUMA part, a quick recap of the syscalls interface
> of futex2:
> 
> futex_wait(void *uaddr, unsigned int val, unsigned int flags,
>            struct timespec *timo)
> 
> futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
> 
> struct futex_requeue {
> 	void *uaddr;
> 	unsigned int flags;
> };
> 
> futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
> 	      unsigned int nr_wake, unsigned int nr_requeue,
> 	      u64 cmpval, unsigned int flags)
> 
> 
> As requeue already has 6 arguments, we can't add an argument for the
> node ID, we need to pack it in a struct. So then we have
> 
> struct futexX_numa {
>         __uX value;
>         __sX hint;
> };
> 
> Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
> `value` is the futex value and `hint` can be -1 for the current node, or
> [0, MAX_NUMA_NODES) to specify a node. Example:
> 
> struct futex32_numa f = {.value = 0, hint = -1};
> 
> ...
> 
> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> 
> Then &f would be used as the futex address, as expected, and this would
> be used for the current node. If an app is expecting to have calls from
> different nodes then it should do for instance:
> 
> struct futex32_numa f = {.value = 0, hint = 2};
> 
> For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
> first node as default.
> 
> Feedback? Who else should I CC?

Just a few questions:

Do I understand correctly that notifiers won't be able to wake up
waiters unless they know on which node they are waiting?

Is it possible to wait on a futex on different nodes?

Is it possible to wake waiters on a futex on all nodes? When a single
(or N, where N is not "all") waiter is woken, which node is selected? Is
there a rotation of nodes, so that nodes are not skewed in terms of
notified waiters?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ