lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250128230926.11715-1-cpru@amazon.com>
Date: Tue, 28 Jan 2025 17:09:26 -0600
From: Cristian Prundeanu <cpru@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <cpru@...zon.com>, <kprateek.nayak@....com>, <abuehaze@...zon.com>,
	<alisaidi@...zon.com>, <benh@...nel.crashing.org>, <blakgeof@...zon.com>,
	<csabac@...zon.com>, <doebel@...zon.com>, <gautham.shenoy@....com>,
	<joseph.salisbury@...cle.com>, <dietmar.eggemann@....com>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
	<linux-tip-commits@...r.kernel.org>, <mingo@...hat.com>, <x86@...nel.org>,
	<torvalds@...ux-foundation.org>, <bp@...en8.de>
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl

Peter,

Thank you for the recent scheduler rework which went into kernel 6.13. 
Here are the latest test results using mysql+hammerdb, using a standalone 
reproducer (details and instructions below).

Kernel | Runtime      | Throughput | P50 latency
aarm64 | parameters   | (NOPM)     | (larger is worse)
-------+--------------+------------+------------------
6.5    | default      |  baseline  |  baseline
-------+--------------+------------+------------------
6.8    | default      |  -6.9%     |  +7.9%
       | NO_PL NO_RTP |  -1%       |  +1%
       | SCHED_BATCH  |  -9%       |  +10.7%
-------+--------------+------------+------------------
6.12   | default      |  -5.5%     |  +6.2%
       | NO_PL NO_RTP |  -0.4%     |  +0.1%
       | SCHED_BATCH  |  -4.1%     |  +4.9%
-------+--------------+------------+------------------
6.13   | default      |  -4.8%     |  +5.4%
       | NO_PL NO_RTP |  -0.3%     |  +0.01%
       | SCHED_BATCH  |  -4.8%     |  +5.4%
-------+--------------+------------+------------------

A performance improvement is noticeable in kernel 6.13 over 6.12, both in 
latency and throughput. At the same time, SCHED_BATCH no longer has the 
same positive effect it had in 6.12.

Disabling PLACE_LAG and RUN_TO_PARITY is still as effective as before. 
For this reason, I'd like to ask once again that this patch set be 
considered for merging and for backporting to kernels 6.6+.

> This patchset disables the scheduler features PLACE_LAG and RUN_TO_PARITY 
> and moves them to sysctl.
>
> Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced 
> significant performance degradation in multiple database-oriented 
> workloads. This degradation manifests in all kernel versions using EEVDF, 
> across multiple Linux distributions, hardware architectures (x86_64, 
> aarm64, amd64), and CPU generations.

When weighing the relevance of various testing approaches, please keep in 
mind that mysql is a real-life workload, while the test which prompted the 
introduction of PLACE_LAG is much closer to a synthetic benchmark.


Instructions for reproducing the above tests:

1. Code: The repro scenario that was used for this round of testing can be 
found here: https://github.com/aws/repro-collection

2. Setup: I used a 16 vCPU / 32G RAM / 1TB RAID0 SSD instance as SUT, 
running Ubuntu 22.04 with the latest updates. All kernels were compiled 
from source, preserving the same config (as much as possible) to minimize 
noise - in particular, CONFIG_HZ=250 was used everywhere.

3. Running: To run the repro, set up a SUT machine and a LDG (loadgen) 
machine on the same network, clone the git repo on both, and run:

(on the SUT) ./repro.sh repro-mysql-EEVDF-regression SUT --ldg=<loadgen_IP> 

(on the LDG) ./repro.sh repro-mysql-EEVDF-regression LDG --sut=<SUT_IP>

The repro will build and test multiple combinations of kernel versions and 
scheduler settings, and will prompt you when to reboot the SUT and rerun 
the same command to continue the process.

More instructions can be found both in the repo's README and by running 
'repro.sh --help'.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ