linux-kernel - Re: [PATCH v4 00/15] Add futex2 syscalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210608023302.34yzrm5ktf3qvxhq@offworld>
Date:   Mon, 7 Jun 2021 19:33:02 -0700
From:   Davidlohr Bueso <dave@...olabs.net>
To:     Andrï¿½ Almeida <andrealmeid@...labora.com>
Cc:     Nicholas Piggin <npiggin@...il.com>, acme@...nel.org,
        Andrey Semashev <andrey.semashev@...il.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        corbet@....net, Darren Hart <dvhart@...radead.org>,
        fweimer@...hat.com, joel@...lfernandes.org, kernel@...labora.com,
        krisman@...labora.com, libc-alpha@...rceware.org,
        linux-api@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, malteskarupke@...tmail.fm,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        pgriffais@...vesoftware.com, Peter Oskolkov <posk@...k.io>,
        Steven Rostedt <rostedt@...dmis.org>, shuah@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>, z.figura12@...il.com
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On Mon, 07 Jun 2021, Andrï¿½ Almeida wrote:

>Às 22:09 de 04/06/21, Nicholas Piggin escreveu:
>> Actually one other scalability thing while I remember it:
>>
>> futex_wait currently requires that the lock word is tested under the
>> queue spin lock (to avoid consuming a wakeup). The problem with this is
>> that the lock word can be a very hot cache line if you have a lot of
>> concurrency, so accessing it under the queue lock can increase queue
>> lock hold time.
>>
>> I would prefer if the new API was relaxed to avoid this restriction
>> (e.g., any wait call may consume a wakeup so it's up to userspace to
>> avoid that if it is a problem).
>
>Maybe I'm wrong, but AFAIK the goal of checking the lock word inside the
>spin lock is to avoid sleeping forever (in other words, wrongly assuming
>that the lock is taken and missing a wakeup call), not to avoid
>consuming wakeups. Or at least this is my interpretation of this long
>comment in futex.c:
>
>https://elixir.bootlin.com/linux/v5.12.9/source/kernel/futex.c#L51

I think what Nick is referring to is that futex_wait() could return 0
instead of EAGAIN upon a uval != val condition if the check is done
without the hb lock. The value could have changed between when userspace
did the condition check and called into futex(2) to block in the slowpath.

But such spurious scenarios should be pretty rare, and while I agree that
the cacheline can be hot, I'm not sure how much of a performance issue this
really is(?), compared to other issues, certainly not to govern futex2
design. Changing such semantics would be a _huge_ difference between futex1
and futex2.

At least compared, for example, to the hb collisions serializing independent
futexes, affecting both performance and determinism. And I agree that a new
interface should address this problem - albeit most of the workloads I have
seen in production use but a handful of futexes and larger thread counts.
One thing that crossed my mind (but have not actually sat down to look at)
would be to use rlhastables for the dynamic resizing, but of course that would
probably add a decent amount of overhead to the simple hashing we currently have.

Thanks,
Davidlohr