[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250520181451.18994-1-cpru@amazon.com>
Date: Tue, 20 May 2025 13:14:51 -0500
From: Cristian Prundeanu <cpru@...zon.com>
To: K Prateek Nayak <kprateek.nayak@....com>
CC: Cristian Prundeanu <cpru@...zon.com>, Hazem Mohamed Abuelfotoh
<abuehaze@...zon.com>, Ali Saidi <alisaidi@...zon.com>, "Benjamin
Herrenschmidt" <benh@...nel.crashing.org>, Geoff Blake <blakgeof@...zon.com>,
Borislav Petkov <bp@...en8.de>, Csaba Csoma <csabac@...zon.com>, "Dietmar
Eggemann" <dietmar.eggemann@....com>, Bjoern Doebel <doebel@...zon.de>,
Gautham Shenoy <gautham.shenoy@....com>, Joseph Salisbury
<joseph.salisbury@...cle.com>, Chris Redpath <chris.redpath@....com>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<linux-tip-commits@...r.kernel.org>, <x86@...nel.org>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Swapnil Sapkal
<swapnil.sapkal@....com>, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: EEVDF regression still exists
>> The only _scheduler_ change that looks relevant is commit bbce3de72be5
>> ("sched/eevdf: Fix se->slice being set to U64_MAX and resulting
>> crash"). Which does affect the slice calculation, although supposedly
>> only under special circumstances.>
>> Of course, it could be something else.
>
> Since it is the only !SCHED_EXT change in kernel/sched, Cristian can
> perhaps try reverting it on top of v6.15-rc4 and checking if the
> benchmark results jump back to v6.15-rc3 level to rule that single
> change out. Very likely it could be something else.
I have tested reverting this commit, and the performance indeed jumped back
to rc3 levels.
> The VU count should really be based on the SUT core count, and be at least
> 8 * SUT vCPUs to ensure a full load.
I've modified the reproducer to more accurately configure the VU count
based on the SUT's vCPU count, and use the above multiplier going forward.
Retesting the entire kernel range with the resulting 128 VUs shows a
slightly higher performance everywhere compared to the previous 256 VUs.
The regression is even more visible now, with a few notable points:
- There is a performance inversion from before (6.15-rc3 now underperforms
6.15-rc4). This may be useful data for characterizing the regression.
- Kernel 6.14.7 is about the same as 6.14.6 in default mode, but slower in
SCHED_BATCH mode (-7.1% vs -6.4%).
- Kernel 6.15-rc5 is faster than all other 6.15-rcX builds, especially in
default mode.
- Kernel 6.15-rc7 is worse than 6.15-rc6 everywhere except for the default
mode throughput.
- With either VU value, disabling PLACE_LAG and RUN_TO_PARITY no longer
improves performance significantly on up to date kernels 6.12 and above.
Summary below, full details in the reproducer repo [1].
* All without SCHED_BATCH:
Kernel | Throughput | P50 latency | NOPL+NORTP
aarm64 | (NOPM) | (larger is worse) | (NOPM)
=========+============+===================+============
6.5.13 | baseline | baseline | N/A
---------+------------+-------------------+------------
6.6.91 | -5.7% | +9.9% | -2.6%
---------+------------+-------------------+------------
6.8.12 | -6.0% | +10.7% | -3.4%
---------+------------+-------------------+------------
6.12.29 | -6.8% | +9.5% | -8.0%
---------+------------+-------------------+------------
6.13.12 | -7.6% | +10.5% | -8.5%
---------+------------+-------------------+------------
6.14.7 | -7.0% | +9.8% | -9.8%
---------+------------+-------------------+------------
6.15-rc3 | -8.5% | +11.7% |
---------+------------+-------------------+------------
6.15-rc4 | -7.5% | +10.2% |
---------+------------+-------------------+------------
6.15-rc5 | -6.4% | +8.6 % |
---------+------------+-------------------+------------
6.15-rc6 | -7.5% | +10.4% | -9.0%
---------+------------+-------------------+------------
6.15-rc7 | -7.8% | +11.1% | -8.5%
=========+============+===================+============
* All with SCHED_BATCH:
Kernel | Throughput | P50 latency
aarm64 | (NOPM) | (larger is worse)
=========+============+==================
6.5.13 | baseline | baseline
---------+------------+------------------
6.6.91 | -5.1% | +7.4%
---------+------------+------------------
6.8.12 | -6.0% | +8.6%
---------+------------+------------------
6.12.29 | -6.6% | +8.4%
---------+------------+------------------
6.13.12 | -6.9% | +8.9%
---------+------------+------------------
6.14.7 | -7.1% | +8.7%
---------+------------+------------------
6.15-rc3 | -9.6% | +11.8%
---------+------------+------------------
6.15-rc4 | -7.0% | +8.6%
---------+------------+------------------
6.15-rc5 | -6.6% | +7.9%
---------+------------+------------------
6.15-rc6 | -6.6% | +8.4%
---------+------------+------------------
6.15-rc7 | -7.7% | +9.7%
=========+============+==================
[1] https://github.com/aws/repro-collection/blob/main/repros/repro-mysql-EEVDF-regression/results/20250519/README.md
-Cristian
Powered by blists - more mailing lists