[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241031155640.Fhtm3uFD@linutronix.de>
Date: Thu, 31 Oct 2024 16:56:40 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>,
Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <longman@...hat.com>
Subject: Re: [RFC v2 PATCH 0/4] futex: Add support task local hash maps.
On 2024-10-28 13:13:54 [+0100], To linux-kernel@...r.kernel.org wrote:
> Need to do
> more testing.
So there is "perf bench futex hash". On a 256 CPU NUMA box:
perf bench futex hash -t 240 -m -s -b $hb
and hb 2 … 131072 (moved the allocation to kvmalloc) I get the following
(averaged over 3 three runs)
buckets op/sec
2 9158.33
4 21665.66 + ~136%
8 44686.66 + ~106
16 84144.33 + ~ 88
32 139998.33 + ~ 66
64 279957.0 + ~ 99
128 509533.0 + ~100
256 1019846.0 + ~100
512 1634940.0 + ~ 60
1024 1834859.33 + ~ 12
1868129.33 (global hash, 65536 hash)
2048 1912071.33 + ~ 4
4096 1918686.66 + ~ 0
8192 1922285.66 + ~ 0
16384 1923017.0 + ~ 0
32768 1923319.0 + ~ 0
65536 1932906.0 + ~ 0
131072 2042571.33 + ~ 5
By doubling the hash size the ops/sec almost double until 256 slots.
After 2048 slots the increase is almost noise (except for the last
entry).
Pinning the bench to individual CPUs belonging to a NUMA node and
running the same test with 110 threads only (avg over 5 runs):
ops/sec global ops/sec local
node 0 2278572.2 2534827.4
node 1 2229838.6 2437498.8
node 0+1 2542602.4 2535749.8
<--->
RAW numbers:
futex hash table entries: 65536 (order: 10, 4194304 bytes, vmalloc hugepage)
Run summary [PID 4541]: 240 threads, each operating on 1024 [private] futexes for 10 secs.
Averaged 1883542 operations/sec (+- 0,28%), total secs = 10
Averaged 1864680 operations/sec (+- 0,31%), total secs = 10
Averaged 1856166 operations/sec (+- 0,32%), total secs = 10
1868129.3333333333
====
Run summary [PID 6247]: 240 threads, hash slots: 2 each operating on 1024 [private] futexes for 10 secs.
Averaged 9219 operations/sec (+- 0,19%), total secs = 10
Averaged 9185 operations/sec (+- 0,18%), total secs = 10
Averaged 9071 operations/sec (+- 0,20%), total secs = 10
9158.333333333334
Run summary [PID 6970]: 240 threads, hash slots: 4 each operating on 1024 [private] futexes for 10 secs.
Averaged 16911 operations/sec (+- 0,29%), total secs = 10
Averaged 24145 operations/sec (+- 0,17%), total secs = 10
Averaged 23941 operations/sec (+- 0,17%), total secs = 10
21665.666666666668
Run summary [PID 7693]: 240 threads, hash slots: 8 each operating on 1024 [private] futexes for 10 secs.
Averaged 45376 operations/sec (+- 0,25%), total secs = 10
Averaged 44587 operations/sec (+- 0,17%), total secs = 10
Averaged 44097 operations/sec (+- 0,26%), total secs = 10
44686.666666666664
Run summary [PID 8416]: 240 threads, hash slots: 16 each operating on 1024 [private] futexes for 10 secs.
Averaged 84547 operations/sec (+- 0,25%), total secs = 10
Averaged 84672 operations/sec (+- 0,18%), total secs = 10
Averaged 83214 operations/sec (+- 0,26%), total secs = 10
84144.33333333333
Run summary [PID 9139]: 240 threads, hash slots: 32 each operating on 1024 [private] futexes for 10 secs.
Averaged 163342 operations/sec (+- 0,55%), total secs = 10
Averaged 127630 operations/sec (+- 0,28%), total secs = 10
Averaged 129023 operations/sec (+- 0,27%), total secs = 10
139998.33333333334
Run summary [PID 9862]: 240 threads, hash slots: 64 each operating on 1024 [private] futexes for 10 secs.
Averaged 279627 operations/sec (+- 0,29%), total secs = 10
Averaged 279572 operations/sec (+- 0,21%), total secs = 10
Averaged 280672 operations/sec (+- 0,26%), total secs = 10
279957.0
Run summary [PID 10585]: 240 threads, hash slots: 128 each operating on 1024 [private] futexes for 10 secs.
Averaged 508759 operations/sec (+- 0,21%), total secs = 10
Averaged 511253 operations/sec (+- 0,22%), total secs = 10
Averaged 508587 operations/sec (+- 0,26%), total secs = 10
509533.0
Run summary [PID 11308]: 240 threads, hash slots: 256 each operating on 1024 [private] futexes for 10 secs.
Averaged 1023552 operations/sec (+- 0,10%), total secs = 10
Averaged 1034426 operations/sec (+- 0,11%), total secs = 10
Averaged 1001560 operations/sec (+- 0,10%), total secs = 10
1019846.0
Run summary [PID 12031]: 240 threads, hash slots: 512 each operating on 1024 [private] futexes for 10 secs.
Averaged 1636187 operations/sec (+- 0,22%), total secs = 10
Averaged 1607427 operations/sec (+- 0,23%), total secs = 10
Averaged 1661206 operations/sec (+- 0,24%), total secs = 10
1634940.0
Run summary [PID 12756]: 240 threads, hash slots: 1024 each operating on 1024 [private] futexes for 10 secs.
Averaged 1833474 operations/sec (+- 0,24%), total secs = 10
Averaged 1835817 operations/sec (+- 0,24%), total secs = 10
Averaged 1835287 operations/sec (+- 0,25%), total secs = 10
1834859.3333333333
Run summary [PID 13479]: 240 threads, hash slots: 2048 each operating on 1024 [private] futexes for 10 secs.
Averaged 1915836 operations/sec (+- 0,29%), total secs = 10
Averaged 1907866 operations/sec (+- 0,28%), total secs = 10
Averaged 1912512 operations/sec (+- 0,29%), total secs = 10
1912071.3333333333
Run summary [PID 14202]: 240 threads, hash slots: 4096 each operating on 1024 [private] futexes for 10 secs.
Averaged 1916947 operations/sec (+- 0,27%), total secs = 10
Averaged 1918102 operations/sec (+- 0,28%), total secs = 10
Averaged 1921011 operations/sec (+- 0,29%), total secs = 10
1918686.6666666667
Run summary [PID 14925]: 240 threads, hash slots: 8192 each operating on 1024 [private] futexes for 10 secs.
Averaged 1916001 operations/sec (+- 0,27%), total secs = 10
Averaged 1923156 operations/sec (+- 0,27%), total secs = 10
Averaged 1927700 operations/sec (+- 0,27%), total secs = 10
1922285.6666666667
Run summary [PID 15648]: 240 threads, hash slots: 16384 each operating on 1024 [private] futexes for 10 secs.
Averaged 1928497 operations/sec (+- 0,28%), total secs = 10
Averaged 1916906 operations/sec (+- 0,27%), total secs = 10
Averaged 1923648 operations/sec (+- 0,26%), total secs = 10
1923017.0
Run summary [PID 16371]: 240 threads, hash slots: 32768 each operating on 1024 [private] futexes for 10 secs.
Averaged 1920425 operations/sec (+- 0,27%), total secs = 10
Averaged 1923449 operations/sec (+- 0,27%), total secs = 10
Averaged 1926083 operations/sec (+- 0,29%), total secs = 10
1923319.0
Run summary [PID 17094]: 240 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 1927007 operations/sec (+- 0,28%), total secs = 10
Averaged 1935182 operations/sec (+- 0,28%), total secs = 10
Averaged 1936529 operations/sec (+- 0,28%), total secs = 10
1932906.0
Run summary [PID 17817]: 240 threads, hash slots: 131072 each operating on 1024 [private] futexes for 10 secs.
Averaged 2033664 operations/sec (+- 0,32%), total secs = 10
Averaged 2060081 operations/sec (+- 0,33%), total secs = 10
Averaged 2033969 operations/sec (+- 0,32%), total secs = 10
2042571.3333333333
----
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 7679's current affinity list: 64-127,192-255
====
# Running 'futex/hash' benchmark:
Run summary [PID 23094]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2180419 operations/sec (+- 0,77%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23205]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2258612 operations/sec (+- 0,87%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23317]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2245819 operations/sec (+- 0,80%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23428]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2231469 operations/sec (+- 0,81%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23539]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2232874 operations/sec (+- 0,78%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 23650]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2469636 operations/sec (+- 0,92%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23761]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2432942 operations/sec (+- 0,91%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23872]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2411433 operations/sec (+- 0,90%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23983]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2438380 operations/sec (+- 0,94%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24094]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2435103 operations/sec (+- 0,94%), total secs = 10
====
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 9731's current affinity list: 0-63,128-191
====
# Running 'futex/hash' benchmark:
Run summary [PID 24207]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2206612 operations/sec (+- 0,75%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24318]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2321819 operations/sec (+- 0,85%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24429]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2238386 operations/sec (+- 0,77%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24541]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2325869 operations/sec (+- 0,85%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24652]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2300175 operations/sec (+- 0,82%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 24763]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2530561 operations/sec (+- 0,96%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24874]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2573315 operations/sec (+- 1,03%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24985]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2517479 operations/sec (+- 0,99%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25096]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2554631 operations/sec (+- 1,01%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25207]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2498151 operations/sec (+- 0,94%), total secs = 10
====
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 10975's current affinity list: 0-255
====
# Running 'futex/hash' benchmark:
Run summary [PID 25324]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2561817 operations/sec (+- 0,14%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25435]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539522 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25546]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2532349 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25657]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539481 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25768]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539843 operations/sec (+- 0,13%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 25879]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2540858 operations/sec (+- 0,50%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25990]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2550342 operations/sec (+- 0,48%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26101]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2522785 operations/sec (+- 0,48%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26212]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2528686 operations/sec (+- 0,49%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26323]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2536078 operations/sec (+- 0,48%), total secs = 10
====
Sebastian
Powered by blists - more mailing lists