lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Jul 2022 00:18:51 -0300
From:   André Almeida <andrealmeid@...lia.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        linux-kernel@...r.kernel.org
Cc:     linux-api@...r.kernel.org, fweimer@...hat.com,
        libc-alpha@...rceware.org,
        Andrey Semashev <andrey.semashev@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: [RFC] futex2: add NUMA awareness

Hi,

futex2 is an ongoing project with the goal to create a new interface for
futex that solves ongoing issues with the current syscall.

One of this problems is the lack of NUMA awareness for futex operations.
This RFC is aimed to gather feedback around the a NUMA interface proposal.

 * The problem

futex has a single, global hash table to store information of current
waiters to be queried by wakers. This hash table is stored in a single
node in non-uniform machines. This means that a process running in other
nodes will have some overhead using futex, given that it will need to
access the table in a different node.

 * A solution

For NUMA machines, it would be allocated a table per node. Processes
then would be able to use the local table to avoid sharing data with
other nodes.

 * The interface

Userspace needs to specify which node would like to use to store/query
the futex table. The common case would be to operate on the current
node, but some cases could required to operate in another one.

Before getting to the NUMA part, a quick recap of the syscalls interface
of futex2:

futex_wait(void *uaddr, unsigned int val, unsigned int flags,
           struct timespec *timo)

futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)

struct futex_requeue {
	void *uaddr;
	unsigned int flags;
};

futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
	      unsigned int nr_wake, unsigned int nr_requeue,
	      u64 cmpval, unsigned int flags)


As requeue already has 6 arguments, we can't add an argument for the
node ID, we need to pack it in a struct. So then we have

struct futexX_numa {
        __uX value;
        __sX hint;
};

Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
`value` is the futex value and `hint` can be -1 for the current node, or
[0, MAX_NUMA_NODES) to specify a node. Example:

struct futex32_numa f = {.value = 0, hint = -1};

...

futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);

Then &f would be used as the futex address, as expected, and this would
be used for the current node. If an app is expecting to have calls from
different nodes then it should do for instance:

struct futex32_numa f = {.value = 0, hint = 2};

For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
first node as default.

Feedback? Who else should I CC?

Thanks,
	André

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ