linux-kernel - Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3af1586a-f5b8-9728-d140-4fc4709ba49c@valvesoftware.com>
Date:   Wed, 31 Jul 2019 18:42:38 -0700
From:   "Pierre-Loup A. Griffais" <pgriffais@...vesoftware.com>
To:     Zebediah Figura <z.figura12@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Gabriel Krisman Bertazi <krisman@...labora.com>
CC:     <mingo@...hat.com>, <peterz@...radead.org>, <dvhart@...radead.org>,
        <linux-kernel@...r.kernel.org>, <kernel@...labora.com>,
        Steven Noonan <steven@...vesoftware.com>
Subject: Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of
 several futexes

On 7/31/19 6:32 PM, Zebediah Figura wrote:
> On 7/31/19 8:22 PM, Zebediah Figura wrote:
>> On 7/31/19 7:45 PM, Thomas Gleixner wrote:
>>> If I assume a maximum of 65 futexes which got mentioned in one of the
>>> replies then this will allocate 7280 bytes alone for the futex_q 
>>> array with
>>> a stock debian config which has no debug options enabled which would 
>>> bloat
>>> the struct. Adding the futex_wait_block array into the same allocation
>>> becomes larger than 8K which already exceeds thelimit of SLUB kmem
>>> caches and forces the whole thing into the page allocator directly.
>>>
>>> This sucks.
>>>
>>> Also I'm confused about the 64 maximum resulting in 65 futexes 
>>> comment in
>>> one of the mails.
>>>
>>> Can you please explain what you are trying to do exatly on the user 
>>> space
>>> side?
>>
>> The extra futex comes from the fact that there are a couple of, as it
>> were, out-of-band ways to wake up a thread on Windows. [Specifically, a
>> thread can enter an "alertable" wait in which case it will be woken up
>> by a request from another thread to execute an "asynchronous procedure
>> call".] It's easiest for us to just add another futex to the list in
>> that case.
> 
> To be clear, the 64/65 distinction is an implementation detail that's 
> pretty much outside the scope of this discussion. I should have just 
> said 65 directly. Sorry about that.
> 
>>
>> I'd also point out, for whatever it's worth, that while 64 is a hard
>> limit, real applications almost never go nearly that high. By far the
>> most common number of primitives to select on is one.
>> Performance-critical code never tends to wait on more than three. The
>> most I've ever seen is twelve.
>>
>> If you'd like to see the user-side source, most of the relevant code is
>> at [1], in particular the functions __fsync_wait_objects() [line 712]
>> and do_single_wait [line 655]. Please feel free to ask for further
>> clarification.
>>
>> [1]
>> https://github.com/ValveSoftware/wine/blob/proton_4.11/dlls/ntdll/fsync.c

In addition, here's an example of how I think it might be useful to 
expose it to apps at large in a way that's compatible with existing 
pthread mutexes:

https://github.com/Plagman/glibc/commit/3b01145fa25987f2f93e7eda7f3e7d0f2f77b290

This patch hasn't received nearly as much testing as the Wine fsync code 
path, but that functionality would provide more CPU-efficient ways for 
thread pool code to sleep in our game engine. We also use eventfd today.

For this, I think the expected upper bound for the per-op futex count 
would be in the same order of magnitude as the logical CPU count on the 
target machine, similar as the Wine use-case.

Thanks,
  - Pierre-Loup

>>
>>
>>
>>>
>>> Thanks,
>>>
>>>     tglx
>>>
>>
>