[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3af1586a-f5b8-9728-d140-4fc4709ba49c@valvesoftware.com>
Date: Wed, 31 Jul 2019 18:42:38 -0700
From: "Pierre-Loup A. Griffais" <pgriffais@...vesoftware.com>
To: Zebediah Figura <z.figura12@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Gabriel Krisman Bertazi <krisman@...labora.com>
CC: <mingo@...hat.com>, <peterz@...radead.org>, <dvhart@...radead.org>,
<linux-kernel@...r.kernel.org>, <kernel@...labora.com>,
Steven Noonan <steven@...vesoftware.com>
Subject: Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of
several futexes
On 7/31/19 6:32 PM, Zebediah Figura wrote:
> On 7/31/19 8:22 PM, Zebediah Figura wrote:
>> On 7/31/19 7:45 PM, Thomas Gleixner wrote:
>>> If I assume a maximum of 65 futexes which got mentioned in one of the
>>> replies then this will allocate 7280 bytes alone for the futex_q
>>> array with
>>> a stock debian config which has no debug options enabled which would
>>> bloat
>>> the struct. Adding the futex_wait_block array into the same allocation
>>> becomes larger than 8K which already exceeds thelimit of SLUB kmem
>>> caches and forces the whole thing into the page allocator directly.
>>>
>>> This sucks.
>>>
>>> Also I'm confused about the 64 maximum resulting in 65 futexes
>>> comment in
>>> one of the mails.
>>>
>>> Can you please explain what you are trying to do exatly on the user
>>> space
>>> side?
>>
>> The extra futex comes from the fact that there are a couple of, as it
>> were, out-of-band ways to wake up a thread on Windows. [Specifically, a
>> thread can enter an "alertable" wait in which case it will be woken up
>> by a request from another thread to execute an "asynchronous procedure
>> call".] It's easiest for us to just add another futex to the list in
>> that case.
>
> To be clear, the 64/65 distinction is an implementation detail that's
> pretty much outside the scope of this discussion. I should have just
> said 65 directly. Sorry about that.
>
>>
>> I'd also point out, for whatever it's worth, that while 64 is a hard
>> limit, real applications almost never go nearly that high. By far the
>> most common number of primitives to select on is one.
>> Performance-critical code never tends to wait on more than three. The
>> most I've ever seen is twelve.
>>
>> If you'd like to see the user-side source, most of the relevant code is
>> at [1], in particular the functions __fsync_wait_objects() [line 712]
>> and do_single_wait [line 655]. Please feel free to ask for further
>> clarification.
>>
>> [1]
>> https://github.com/ValveSoftware/wine/blob/proton_4.11/dlls/ntdll/fsync.c
In addition, here's an example of how I think it might be useful to
expose it to apps at large in a way that's compatible with existing
pthread mutexes:
https://github.com/Plagman/glibc/commit/3b01145fa25987f2f93e7eda7f3e7d0f2f77b290
This patch hasn't received nearly as much testing as the Wine fsync code
path, but that functionality would provide more CPU-efficient ways for
thread pool code to sleep in our game engine. We also use eventfd today.
For this, I think the expected upper bound for the per-op futex count
would be in the same order of magnitude as the logical CPU count on the
target machine, similar as the Wine use-case.
Thanks,
- Pierre-Loup
>>
>>
>>
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>>
>
Powered by blists - more mailing lists