[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8cefc4be-b11a-414c-b3f4-280c900be67b@amd.com>
Date: Mon, 24 Nov 2025 12:40:44 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, "Paul E. McKenney"
<paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
<corbet@....net>, Prakash Sangappa <prakash.sangappa@...cle.com>, "Madadi
Vineeth Reddy" <vineethr@...ux.ibm.com>, Steven Rostedt
<rostedt@...dmis.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Arnd Bergmann <arnd@...db.de>, <linux-arch@...r.kernel.org>, Randy Dunlap
<rdunlap@...radead.org>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch V4 00/12] rseq: Implement time slice extension mechanism
Hello Thomas,
On 11/17/2025 2:21 AM, Thomas Gleixner wrote:
> For your convenience all of it is also available as a conglomerate from
> git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/slice
>
I got a chance to test the series with Netflix's userspace locking
benchmark [1] which ended up looking very similar to the test
Steven had written initially when discussing the PoC except this
can scale the number of threads.
[1] https://www.github.com/Netflix/global-lock-bench/
Critical section is just a single increment operation. Metric is
average time taken to run a fixed amount of critical sections across
#Threads over 3 runs.
Here are the results of running the test with the default config on
my 256CPU machine in a root cpuset containing 32CPUs to actually hit
the contention:
o rseq/slice with no benchmark modifications and "rseq_slice_ext=0"
| Threads | Threaded (s) | Threaded/s |
+---------+-----------------+--------------+
| 1 | .026103 | 383493128.79 |
| 2 | .086320 | 116134267.40 |
| 4 | .669743 | 14937390.67 |
| 8 | 1.105109 | 9053764.30 |
| 16 | 1.863516 | 5366809.94 |
| 32 | 7.249873 | 1379590.12 |
| 64 | 14.360199 | 696486.76 |
| 96 | 21.909887 | 456458.03 |
| 128 | 29.126423 | 343358.95 |
| 192 | 43.112188 | 231980.16 |
| 256 | 57.628748 | 173554.39 |
| 384 | 86.274354 | 115909.73 |
| 512 | 114.564142 | 87289.97 |
o rseq/slice with modified benchmark and "rseq_slice_ext=1"
| Threads | Threaded (s) | Threaded/s | %diff (s) |
+---------+-----------------+--------------+-----------+
| 1 | .036438 | 274437690.71 | 40% |
| 2 | .147520 | 68851845.82 | 71% |
| 4 | .829240 | 12176948.03 | 24% |
| 8 | 1.259632 | 7993476.42 | 14% |
| 16 | 1.988396 | 5029209.62 | 7% |
| 32 | 9.844307 | 1015837.43 | 36% |
| 64 | 14.590723 | 685979.41 | 2% |
| 96 | 18.898278 | 529171.84 | -14% |
| 128 | 23.921747 | 418033.09 | -18% |
| 192 | 33.284228 | 300673.66 | -23% |
| 256 | 42.934755 | 232934.87 | -25% |
| 384 | 61.794499 | 161924.64 | -28% |
| 512 | 82.005069 | 121951.34 | -28% |
( Lower %diff is better )
Until the contention begins (> 32 threads), there is a consistent
regression which I believe can be attributed to the additional
overhead in the critical section from setting the slice_ext
request, however, once heavy contention begins, there is a clear
win with slice extension.
Feel free to include:
Tested-by: K Prateek Nayak <kprateek.nayak@....com>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists