linux-kernel - Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241125113535.88583-1-cpru@amazon.com>
Date: Mon, 25 Nov 2024 05:35:35 -0600
From: Cristian Prundeanu <cpru@...zon.com>
To: <cpru@...zon.com>
CC: <kprateek.nayak@....com>, <abuehaze@...zon.com>, <alisaidi@...zon.com>,
	<benh@...nel.crashing.org>, <blakgeof@...zon.com>, <csabac@...zon.com>,
	<doebel@...zon.com>, <gautham.shenoy@....com>, <joseph.salisbury@...cle.com>,
	<dietmar.eggemann@....com>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-kernel@...r.kernel.org>, <linux-tip-commits@...r.kernel.org>,
	<mingo@...hat.com>, <peterz@...radead.org>, <x86@...nel.org>
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl

Here are more results with recent 6.12 code, and also using SCHED_BATCH.
The control tests were run anew on Ubuntu 22.04 with the current pre-built
kernels 6.5 (baseline) and 6.8 (regression out of the box).

When updating mysql from 8.0.30 to 8.4.2, the regression grew even larger.
Disabling PLACE_LAG and RUN _TO_PARITY improved the results more than
using SCHED_BATCH.

Kernel   | default  | NO_PLACE_LAG and | SCHED_BATCH | mysql
         | config   | NO_RUN_TO_PARITY |             | version
---------+----------+------------------+-------------+---------
6.8      | -15.3%   |                  |             | 8.0.30
6.12-rc7 | -11.4%   | -9.2%            | -11.6%      | 8.0.30
         |          |                  |             |
6.8      | -18.1%   |                  |             | 8.4.2
6.12-rc7 | -14.0%   | -10.2%           | -12.7%      | 8.4.2
---------+----------+------------------+-------------+---------

Confidence intervals for all tests are smaller than +/- 0.5%.

I expect to have the repro package ready by the end of the week. Thank you
for your collective patience and efforts to confirm these results.

On 2024-11-01, Peter Zijlstra wrote:

>> (At the risk of stating the obvious, using SCHED_BATCH only to get back to 
>> the default CFS performance is still only a workaround,
>
> It is not really -- it is impossible to schedule all the various
> workloads without them telling us what they really like. The quest is to
> find interfaces that make sense and are implementable. But fundamentally
> tasks will have to start telling us what they need. We've long since ran
> out of crystal balls.

Completely agree that the best performance is obtained when the tasks are
individually tuned to the scheduler and explicitly set running parameters.
This isn't different from before.

But shouldn't our gold standard for default performance be CFS? There is a
significant regression out of the box when using EEVDF; how is seeking
additional tuning just to recover the lost performance not a workaround?

(Not to mention that this additional tuning means shifting the burden on
many users who may not be familiar enough with scheduler functionality.
We're essentially asking everyone to spend considerable effort to maintain
status quo from kernel 6.5.)

On 2024-11-14, Joseph Salisbury wrote:

> This is a confirmation that we are also seeing a 9% performance
> regression with the TPCC benchmark after v6.6-rc1.  We narrowed down the
> regression was caused due to commit:
> 86bfbb7ce4f6 ("sched/fair: Add lag based placement")
> 
> This regression was reported via this thread:
> https://lore.kernel.org/lkml/1c447727-92ed-416c-bca1-a7ca0974f0df@oracle.com/
> 
> Phil Auld suggested to try turning off the PLACE_LAG sched feature. We
> tested with NO_PLACE_LAG and can confirm it brought back 5% of the
> performance loss.  We do not yet know what effect NO_PLACE_LAG will have
> on other benchmarks, but it indeed helps TPCC.

Thank you for confirming the regression. I've been monitoring performance
on the v6.12-rcX tags since this thread started, and the results have been
largely constant.

I've also tested other benchmarks to verify whether (1) the regression
exists and (2) the patch proposed in this thread negatively affects them.
On postgresql and wordpress/nginx there is a regression which is improved
when applying the patch; on mongo and mariadb no regression manifested, and
the patch did not make their performance worse.

On 2024-11-19, Dietmar Eggemann wrote:

> #cat /etc/systemd/system/mysql.service
>
> [Service]
> CPUSchedulingPolicy=batch
> ExecStart=/usr/local/mysql/bin/mysqld_safe

This is the approach I used as well to get the results above.

> My hunch is that this is due to the 'connection' threads (1 per virtual
> user) running in SCHED_BATCH. I yet have to confirm this by only
> changing the 'connection' tasks to SCHED_BATCH.

Did you have a chance to run with this scenario?