linux-kernel - EEVDF regression still exists

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250429213817.65651-1-cpru@amazon.com>
Date: Tue, 29 Apr 2025 16:38:17 -0500
From: Cristian Prundeanu <cpru@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Cristian Prundeanu <cpru@...zon.com>, K Prateek Nayak
	<kprateek.nayak@....com>, Hazem Mohamed Abuelfotoh <abuehaze@...zon.com>,
	"Ali Saidi" <alisaidi@...zon.com>, Benjamin Herrenschmidt
	<benh@...nel.crashing.org>, Geoff Blake <blakgeof@...zon.com>, Csaba Csoma
	<csabac@...zon.com>, Bjoern Doebel <doebel@...zon.com>, Gautham Shenoy
	<gautham.shenoy@....com>, Swapnil Sapkal <swapnil.sapkal@....com>, "Joseph
 Salisbury" <joseph.salisbury@...cle.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>, Linus Torvalds
	<torvalds@...ux-foundation.org>, Borislav Petkov <bp@...en8.de>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
	<linux-tip-commits@...r.kernel.org>, <x86@...nel.org>
Subject: EEVDF regression still exists

Peter,

Here are the latest results for the EEVDF impact on database workloads. 
The regression introduced in kernel 6.6 still persists and doesn't look 
like it is improving.

This time I've compared apples to apples - default 6.5 vs default 6.12+ 
and SCHED_BATCH on 6.5 vs SCHED_BATCH on 6.12+. The results are below.

Kernel   | Runtime     | Throughput | P50 latency
aarm64   | parameters  | (NOPM)     | (larger is worse)
---------+-------------+------------+------------------
6.5.13   | default     |  baseline  |  baseline
---------+-------------+------------+------------------
6.12.25  | default     |  -5.1%     |  +7.8%
---------+-------------+------------+------------------
6.14.4   | default     |  -7.4%     |  +9.6%
---------+-------------+------------+------------------
6.15-rc4 | default     |  -7.4%     |  +10.2%
======================================================
6.5.13   | SCHED_BATCH |  baseline  |  baseline
---------+-------------+------------+------------------
6.12.25  | SCHED_BATCH |  -8.1%     |  +8.7%
---------+-------------+------------+------------------
6.14.4   | SCHED_BATCH |  -7.9%     |  +8.3%
---------+-------------+------------+------------------
6.15-rc4 | SCHED_BATCH |  -10.6%    |  +11.8%
---------+-------------+------------+------------------

The tests were run with the mysql reproducer published before (link and 
instructions below), using two networked machines running hammerdb and 
mysql respectively. The full test details and reports from "perf sched 
stats" are also posted [1], not included here for brevity.

[1] https://github.com/aws/repro-collection/blob/main/repros/repro-mysql-EEVDF-regression/results/20250428/README.md

At this time, we have accumulated numerous data points and many hours of 
testing exhibiting this regression. The only counter arguments I've seen 
are relying on either synthetic test cases or unrealistic simplified tests 
(e.g. SUT and loadgen on the same machine, or severely limited thread 
count). It's becoming painfully obvious that EEVDF replaced CFS before it 
was ready to be released; yet most of what we've been debating is whether 
SCHED_BATCH is a good enough workaround.

Please let's take a fresh approach at what's happening, and find out why 
the scheduler is underperforming. I'm happy to provide additional data if 
it helps debug this. I've backported and forward ported Swapnil's "perf 
sched stats" command [2] so it is ready to run on any kernel from 6.5 up 
to 6.15, and the reproducer already runs it automatically for convenience.

[2] https://lore.kernel.org/lkml/20250311120230.61774-1-swapnil.sapkal@amd.com/

Instructions for reproducing the above tests (same as before):

1. Code: The reproducer scenario and framework can be found here: 
https://github.com/aws/repro-collection

2. Setup: I used a 16 vCPU / 32G RAM / 1TB RAID0 SSD instance as SUT, 
running Ubuntu 22.04 with the latest updates. All kernels were compiled 
from source, preserving the same config across versions (as much as 
possible) to minimize noise - in particular, CONFIG_HZ=250 was used 
everywhere.

3. Running: To run the repro, set up a SUT machine and a LDG (loadgen) 
machine on the same network, clone the git repo on both, and run:

(on the SUT) ./repro.sh repro-mysql-EEVDF-regression SUT --ldg=<loadgen_IP> 

(on the LDG) ./repro.sh repro-mysql-EEVDF-regression LDG --sut=<SUT_IP>

The repro will build and test multiple combinations of kernel versions and 
scheduler settings, and will prompt you when to reboot the SUT and rerun 
the same above command to continue the process.

More instructions can be found both in the repo's README and by running 
'repro.sh --help'.