lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251121092841.4a2e0cf0@pumpkin>
Date: Fri, 21 Nov 2025 09:28:41 +0000
From: david laight <david.laight@...box.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Prakash Sangappa <prakash.sangappa@...cle.com>, LKML
 <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, "Paul E. McKenney"
 <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
 <corbet@....net>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, K Prateek
 Nayak <kprateek.nayak@....com>, Steven Rostedt <rostedt@...dmis.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Arnd Bergmann
 <arnd@...db.de>, "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 07/12] rseq: Implement syscall entry work for time
 slice extensions

On Thu, 20 Nov 2025 12:31:54 +0100
Thomas Gleixner <tglx@...utronix.de> wrote:

...
> > • Due to the contentious nature of the workload these tests produce
> >   highly erratic results, but the optimization is showing improved
> >   performance across 3x tests with/without use of time slice extension.
> >
> > • Swingbench throughput with use of time slice optimization
> > 	• Run 1: 50,008.10
> > 	• Run 2: 59,160.60
> > 	• Run 3: 67,342.70
> > • Swingbench throughput without use of time slice optimization
> > 	• Run 1: 36,422.80
> > 	• Run 2: 33,186.00
> > 	• Run 3: 44,309.80
> > • The application performs 55% better on average with the optimization.  
> 
> 55% is insane.
> 
> Could you please ask your performance guys to provide numbers for the
> below configurations to see how the different parts of this work are
> affecting the overall result:
> 
>  1) Linux 6.17 (no rseq rework, no slice)
> 
>  2) Linux 6.17 + your initial attempt to enable slice extension
> 
> We already have the numbers for the full new stack above (with and
> without slice), so that should give us the full picture.

If is also worth checking that you don't have a single (or limited)
thread test where the busy thread is being bounced between cpu.

While busy the cpu frequency is increased, when moved to an idle
cpu it will initially run at the low frequency and then speed up.

This effect doubled the execution time of a (mostly) single threaded
fpga compile from 10 minutes to 20 minutes - all caused by one of
the mitigations that slowed down syscall entry/exit enough that a
load of basically idle processes that woke every 10ms to all be
active at once.

You've also got the underlying problem that you can't disable
interrupts in userspace.
If an ISR happens in your 'critical region' you just lose 'big time'.
Any threads that contend pretty much have to wait for the ISR
(and any non-threaded softints) to complete.
With heavy network traffic that can easily exceed 1ms.
Nothing you can to to the scheduler will change it.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ