lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3cb2e03-9794-4bf9-8634-010261533146@redhat.com>
Date: Thu, 31 Oct 2024 16:28:10 -0400
From: Waiman Long <llong@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>,
 Darren Hart <dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>,
 Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>,
 Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC v2 PATCH 0/4] futex: Add support task local hash maps.

On 10/31/24 11:56 AM, Sebastian Andrzej Siewior wrote:
> On 2024-10-28 13:13:54 [+0100], To linux-kernel@...r.kernel.org wrote:
>>                                                             Need to do
>> more testing.
> So there is "perf bench futex hash". On a 256 CPU NUMA box:
> 	perf bench futex hash -t 240 -m -s -b $hb
> and hb 2 … 131072 (moved the allocation to kvmalloc) I get the following
> (averaged over 3 three runs)
>
> buckets op/sec
>        2     9158.33
>        4    21665.66	+ ~136%
>        8    44686.66	+ ~106
>       16    84144.33	+ ~ 88
>       32   139998.33	+ ~ 66
>       64   279957.0	+ ~ 99
>      128   509533.0	+ ~100
>      256  1019846.0	+ ~100
>      512  1634940.0	+ ~ 60
>     1024  1834859.33	+ ~ 12
>           1868129.33 (global hash, 65536 hash)
>     2048  1912071.33	+ ~  4
>     4096  1918686.66	+ ~  0
>     8192  1922285.66	+ ~  0
>    16384  1923017.0	+ ~  0
>    32768  1923319.0	+ ~  0
>    65536  1932906.0	+ ~  0
>   131072  2042571.33	+ ~  5
>
> By doubling the hash size the ops/sec almost double until 256 slots.
> After 2048 slots the increase is almost noise (except for the last
> entry).

Looking at the performance data, we should probably use the global hash 
map to maximize throughput if latency isn't important.

AFAICT, the reason why patch 4 creates a local hash map when the first 
thread is created is to avoid a race of the same futex being hashed on 
both the local and the global hash maps. Correct me if my understanding 
is incorrect. So all the multithreaded processes will have to use local 
hash maps for their private futexes even if they don't care about latency.

Maybe we should limit the auto local hash map creation to only RT 
processes where latency is important. To avoid the race, we could add a 
flag to indicate if a private futex hashing operation had ever been done 
before and prevent the creation of local hash map after that.

My 2 cents.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ