linux-kernel - Re: [patch V4 00/12] rseq: Implement time slice extension mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8cefc4be-b11a-414c-b3f4-280c900be67b@amd.com>
Date: Mon, 24 Nov 2025 12:40:44 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, "Paul E. McKenney"
	<paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
	<corbet@....net>, Prakash Sangappa <prakash.sangappa@...cle.com>, "Madadi
 Vineeth Reddy" <vineethr@...ux.ibm.com>, Steven Rostedt
	<rostedt@...dmis.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Arnd Bergmann <arnd@...db.de>, <linux-arch@...r.kernel.org>, Randy Dunlap
	<rdunlap@...radead.org>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch V4 00/12] rseq: Implement time slice extension mechanism

Hello Thomas,

On 11/17/2025 2:21 AM, Thomas Gleixner wrote:
> For your convenience all of it is also available as a conglomerate from
> git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/slice
> 

I got a chance to test the series with Netflix's userspace locking
benchmark [1] which ended up looking very similar to the test
Steven had written initially when discussing the PoC except this
can scale the number of threads.

[1] https://www.github.com/Netflix/global-lock-bench/

Critical section is just a single increment operation. Metric is
average time taken to run a fixed amount of critical sections across
#Threads over 3 runs.

Here are the results of running the test with the default config on
my 256CPU machine in a root cpuset containing 32CPUs to actually hit
the contention:

o rseq/slice with no benchmark modifications and "rseq_slice_ext=0"

  | Threads |    Threaded (s) |   Threaded/s |
  +---------+-----------------+--------------+
  |       1 |         .026103 | 383493128.79 |
  |       2 |         .086320 | 116134267.40 |
  |       4 |         .669743 |  14937390.67 |
  |       8 |        1.105109 |   9053764.30 |
  |      16 |        1.863516 |   5366809.94 |
  |      32 |        7.249873 |   1379590.12 |
  |      64 |       14.360199 |    696486.76 |
  |      96 |       21.909887 |    456458.03 |
  |     128 |       29.126423 |    343358.95 |
  |     192 |       43.112188 |    231980.16 |
  |     256 |       57.628748 |    173554.39 |
  |     384 |       86.274354 |    115909.73 |
  |     512 |      114.564142 |     87289.97 |


o rseq/slice with modified benchmark and "rseq_slice_ext=1"

  | Threads |    Threaded (s) |   Threaded/s | %diff (s) |
  +---------+-----------------+--------------+-----------+
  |       1 |         .036438 | 274437690.71 |     40%   |
  |       2 |         .147520 |  68851845.82 |     71%   |
  |       4 |         .829240 |  12176948.03 |     24%   |
  |       8 |        1.259632 |   7993476.42 |     14%   |
  |      16 |        1.988396 |   5029209.62 |      7%   |
  |      32 |        9.844307 |   1015837.43 |     36%   |
  |      64 |       14.590723 |    685979.41 |      2%   |
  |      96 |       18.898278 |    529171.84 |    -14%   |
  |     128 |       23.921747 |    418033.09 |    -18%   |
  |     192 |       33.284228 |    300673.66 |    -23%   |
  |     256 |       42.934755 |    232934.87 |    -25%   |
  |     384 |       61.794499 |    161924.64 |    -28%   |
  |     512 |       82.005069 |    121951.34 |    -28%   |
  
  ( Lower %diff is better )


Until the contention begins (> 32 threads), there is a consistent
regression which I believe can be attributed to the additional
overhead in the critical section from setting the slice_ext
request, however, once heavy contention begins, there is a clear
win with slice extension.

Feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@....com>

-- 
Thanks and Regards,
Prateek