[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3cb2e03-9794-4bf9-8634-010261533146@redhat.com>
Date: Thu, 31 Oct 2024 16:28:10 -0400
From: Waiman Long <llong@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>,
Darren Hart <dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC v2 PATCH 0/4] futex: Add support task local hash maps.
On 10/31/24 11:56 AM, Sebastian Andrzej Siewior wrote:
> On 2024-10-28 13:13:54 [+0100], To linux-kernel@...r.kernel.org wrote:
>> Need to do
>> more testing.
> So there is "perf bench futex hash". On a 256 CPU NUMA box:
> perf bench futex hash -t 240 -m -s -b $hb
> and hb 2 … 131072 (moved the allocation to kvmalloc) I get the following
> (averaged over 3 three runs)
>
> buckets op/sec
> 2 9158.33
> 4 21665.66 + ~136%
> 8 44686.66 + ~106
> 16 84144.33 + ~ 88
> 32 139998.33 + ~ 66
> 64 279957.0 + ~ 99
> 128 509533.0 + ~100
> 256 1019846.0 + ~100
> 512 1634940.0 + ~ 60
> 1024 1834859.33 + ~ 12
> 1868129.33 (global hash, 65536 hash)
> 2048 1912071.33 + ~ 4
> 4096 1918686.66 + ~ 0
> 8192 1922285.66 + ~ 0
> 16384 1923017.0 + ~ 0
> 32768 1923319.0 + ~ 0
> 65536 1932906.0 + ~ 0
> 131072 2042571.33 + ~ 5
>
> By doubling the hash size the ops/sec almost double until 256 slots.
> After 2048 slots the increase is almost noise (except for the last
> entry).
Looking at the performance data, we should probably use the global hash
map to maximize throughput if latency isn't important.
AFAICT, the reason why patch 4 creates a local hash map when the first
thread is created is to avoid a race of the same futex being hashed on
both the local and the global hash maps. Correct me if my understanding
is incorrect. So all the multithreaded processes will have to use local
hash maps for their private futexes even if they don't care about latency.
Maybe we should limit the auto local hash map creation to only RT
processes where latency is important. To avoid the race, we could add a
flag to indicate if a private futex hashing operation had ever been done
before and prevent the creation of local hash map after that.
My 2 cents.
Cheers,
Longman
Powered by blists - more mailing lists