[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 03 Mar 2020 14:00:12 +0100
From: Florian Weimer <fweimer@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Pierre-Loup A. Griffais" <pgriffais@...vesoftware.com>,
Thomas Gleixner <tglx@...utronix.de>,
André Almeida <andrealmeid@...labora.com>,
linux-kernel@...r.kernel.org, kernel@...labora.com,
krisman@...labora.com, shuah@...nel.org,
linux-kselftest@...r.kernel.org, rostedt@...dmis.org,
ryao@...too.org, dvhart@...radead.org, mingo@...hat.com,
z.figura12@...il.com, steven@...vesoftware.com,
steven@...uorix.net, malteskarupke@....de, carlos@...hat.com,
adhemerval.zanella@...aro.org, libc-alpha@...rceware.org
Subject: Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
* Peter Zijlstra:
> So how about we introduce new syscalls:
>
> sys_futex_wait(void *uaddr, unsigned long val, unsigned long flags, ktime_t *timo);
>
> struct futex_wait {
> void *uaddr;
> unsigned long val;
> unsigned long flags;
> };
> sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
> unsigned long flags, ktime_t *timo);
>
> sys_futex_wake(void *uaddr, unsigned int nr, unsigned long flags);
>
> sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
> unsigned int nr_requeue, unsigned long cmpval, unsigned long flags);
>
> Where flags:
>
> - has 2 bits for size: 8,16,32,64
> - has 2 more bits for size (requeue) ??
> - has ... bits for clocks
> - has private/shared
> - has numa
What's the actual type of *uaddr? Does it vary by size (which I assume
is in bits?)? Are there alignment constraints?
These system calls seemed to be type-polymorphic still, which is
problematic for defining a really nice C interface. I would really like
to have a strongly typed interface for this, with a nice struct futex
wrapper type (even if it means that we need four of them).
Will all architectures support all sizes? If not, how do we probe which
size/flags combinations are supported?
> For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
> node_id', with the following semantics:
>
> - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
> directly used to index into per-node hash-tables. When -1, it is
> replaced by the current node_id and an smp_mb() is issued before we
> load and compare the @uaddr.
>
> - on WAKE/REQUEUE, it is an immediate index.
Does this mean the first waiter determines the NUMA index, and all
future waiters use the same chain even if they are on different nodes?
I think documenting this as a node index would be a mistake. It could
be an arbitrary hint for locating the corresponding kernel data
structures.
> Any invalid value with result in EINVAL.
Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the
need to maintain alignment and avoid padding.
Thanks,
Florian
Powered by blists - more mailing lists