[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250429213817.65651-1-cpru@amazon.com>
Date: Tue, 29 Apr 2025 16:38:17 -0500
From: Cristian Prundeanu <cpru@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Cristian Prundeanu <cpru@...zon.com>, K Prateek Nayak
<kprateek.nayak@....com>, Hazem Mohamed Abuelfotoh <abuehaze@...zon.com>,
"Ali Saidi" <alisaidi@...zon.com>, Benjamin Herrenschmidt
<benh@...nel.crashing.org>, Geoff Blake <blakgeof@...zon.com>, Csaba Csoma
<csabac@...zon.com>, Bjoern Doebel <doebel@...zon.com>, Gautham Shenoy
<gautham.shenoy@....com>, Swapnil Sapkal <swapnil.sapkal@....com>, "Joseph
Salisbury" <joseph.salisbury@...cle.com>, Dietmar Eggemann
<dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>, Linus Torvalds
<torvalds@...ux-foundation.org>, Borislav Petkov <bp@...en8.de>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<linux-tip-commits@...r.kernel.org>, <x86@...nel.org>
Subject: EEVDF regression still exists
Peter,
Here are the latest results for the EEVDF impact on database workloads.
The regression introduced in kernel 6.6 still persists and doesn't look
like it is improving.
This time I've compared apples to apples - default 6.5 vs default 6.12+
and SCHED_BATCH on 6.5 vs SCHED_BATCH on 6.12+. The results are below.
Kernel | Runtime | Throughput | P50 latency
aarm64 | parameters | (NOPM) | (larger is worse)
---------+-------------+------------+------------------
6.5.13 | default | baseline | baseline
---------+-------------+------------+------------------
6.12.25 | default | -5.1% | +7.8%
---------+-------------+------------+------------------
6.14.4 | default | -7.4% | +9.6%
---------+-------------+------------+------------------
6.15-rc4 | default | -7.4% | +10.2%
======================================================
6.5.13 | SCHED_BATCH | baseline | baseline
---------+-------------+------------+------------------
6.12.25 | SCHED_BATCH | -8.1% | +8.7%
---------+-------------+------------+------------------
6.14.4 | SCHED_BATCH | -7.9% | +8.3%
---------+-------------+------------+------------------
6.15-rc4 | SCHED_BATCH | -10.6% | +11.8%
---------+-------------+------------+------------------
The tests were run with the mysql reproducer published before (link and
instructions below), using two networked machines running hammerdb and
mysql respectively. The full test details and reports from "perf sched
stats" are also posted [1], not included here for brevity.
[1] https://github.com/aws/repro-collection/blob/main/repros/repro-mysql-EEVDF-regression/results/20250428/README.md
At this time, we have accumulated numerous data points and many hours of
testing exhibiting this regression. The only counter arguments I've seen
are relying on either synthetic test cases or unrealistic simplified tests
(e.g. SUT and loadgen on the same machine, or severely limited thread
count). It's becoming painfully obvious that EEVDF replaced CFS before it
was ready to be released; yet most of what we've been debating is whether
SCHED_BATCH is a good enough workaround.
Please let's take a fresh approach at what's happening, and find out why
the scheduler is underperforming. I'm happy to provide additional data if
it helps debug this. I've backported and forward ported Swapnil's "perf
sched stats" command [2] so it is ready to run on any kernel from 6.5 up
to 6.15, and the reproducer already runs it automatically for convenience.
[2] https://lore.kernel.org/lkml/20250311120230.61774-1-swapnil.sapkal@amd.com/
Instructions for reproducing the above tests (same as before):
1. Code: The reproducer scenario and framework can be found here:
https://github.com/aws/repro-collection
2. Setup: I used a 16 vCPU / 32G RAM / 1TB RAID0 SSD instance as SUT,
running Ubuntu 22.04 with the latest updates. All kernels were compiled
from source, preserving the same config across versions (as much as
possible) to minimize noise - in particular, CONFIG_HZ=250 was used
everywhere.
3. Running: To run the repro, set up a SUT machine and a LDG (loadgen)
machine on the same network, clone the git repo on both, and run:
(on the SUT) ./repro.sh repro-mysql-EEVDF-regression SUT --ldg=<loadgen_IP>
(on the LDG) ./repro.sh repro-mysql-EEVDF-regression LDG --sut=<SUT_IP>
The repro will build and test multiple combinations of kernel versions and
scheduler settings, and will prompt you when to reboot the SUT and rerun
the same above command to continue the process.
More instructions can be found both in the repo's README and by running
'repro.sh --help'.
Powered by blists - more mailing lists