[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251121092841.4a2e0cf0@pumpkin>
Date: Fri, 21 Nov 2025 09:28:41 +0000
From: david laight <david.laight@...box.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Prakash Sangappa <prakash.sangappa@...cle.com>, LKML
<linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, "Paul E. McKenney"
<paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
<corbet@....net>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, K Prateek
Nayak <kprateek.nayak@....com>, Steven Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Arnd Bergmann
<arnd@...db.de>, "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 07/12] rseq: Implement syscall entry work for time
slice extensions
On Thu, 20 Nov 2025 12:31:54 +0100
Thomas Gleixner <tglx@...utronix.de> wrote:
...
> > • Due to the contentious nature of the workload these tests produce
> > highly erratic results, but the optimization is showing improved
> > performance across 3x tests with/without use of time slice extension.
> >
> > • Swingbench throughput with use of time slice optimization
> > • Run 1: 50,008.10
> > • Run 2: 59,160.60
> > • Run 3: 67,342.70
> > • Swingbench throughput without use of time slice optimization
> > • Run 1: 36,422.80
> > • Run 2: 33,186.00
> > • Run 3: 44,309.80
> > • The application performs 55% better on average with the optimization.
>
> 55% is insane.
>
> Could you please ask your performance guys to provide numbers for the
> below configurations to see how the different parts of this work are
> affecting the overall result:
>
> 1) Linux 6.17 (no rseq rework, no slice)
>
> 2) Linux 6.17 + your initial attempt to enable slice extension
>
> We already have the numbers for the full new stack above (with and
> without slice), so that should give us the full picture.
If is also worth checking that you don't have a single (or limited)
thread test where the busy thread is being bounced between cpu.
While busy the cpu frequency is increased, when moved to an idle
cpu it will initially run at the low frequency and then speed up.
This effect doubled the execution time of a (mostly) single threaded
fpga compile from 10 minutes to 20 minutes - all caused by one of
the mitigations that slowed down syscall entry/exit enough that a
load of basically idle processes that woke every 10ms to all be
active at once.
You've also got the underlying problem that you can't disable
interrupts in userspace.
If an ISR happens in your 'critical region' you just lose 'big time'.
Any threads that contend pretty much have to wait for the ISR
(and any non-threaded softints) to complete.
With heavy network traffic that can easily exceed 1ms.
Nothing you can to to the scheduler will change it.
David
Powered by blists - more mailing lists