linux-kernel - Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <70D6B66E-B4BC-4A92-9A23-0DADE9B8C3FE@amazon.com>
Date: Thu, 17 Oct 2024 18:19:00 +0000
From: "Prundeanu, Cristian" <cpru@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: "linux-tip-commits@...r.kernel.org" <linux-tip-commits@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Ingo Molnar
	<mingo@...hat.com>, "x86@...nel.org" <x86@...nel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "Doebel, Bjoern" <doebel@...zon.de>,
	"Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>, "Blake, Geoff"
	<blakgeof@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>, "Csoma, Csaba"
	<csabac@...zon.com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY
 and move them to sysctl

On 2024-10-17, 04:11, "Peter Zijlstra" <peterz@...radead.org> wrote:

>> For example, running mysql+hammerdb results in a 12-17% throughput
> Gautham, is this a benchmark you're running?

Most of my testing for this investigation is on mysql+hammerdb because it
simplifies differentiating statistically meaningful results, but
performance impact (and improvement from disabling the two features) also
shows on workloads based on postgresql and on wordpress+nginx.

> How does using SCHED_BATCH compare?

I haven't tested with SCHED_BATCH yet, will update the thread with results 
as they accumulate (each variation of the test takes multiple hours, not
counting result processing and evaluation).

Looking at man sched for SCHED_BATCH: "the scheduler will apply a small
scheduling penalty with respect to wakeup behavior, so that this thread is
mildly disfavored in scheduling decisions". Would this correctly translate
to "the thread will run more deterministically, but be scheduled less
frequently than other threads", i.e. expectedly lower performance in 
exchange for less variability?

> So disabling them by default will undoubtedly affect a ton of other
> workloads.

That's very likely either way, as the testing space is near infinite, but 
it seems more practical to first address the issue we already know about.

At this time, I don't have any data points to indicate a negative
impact of disabling them for popular production workloads (as opposed to
the flip case). More testing is in progress (looking at the major areas:
workloads heavy on CPU, RAM, disk, and networking); so far, the results
show no downside.

> And sysctl is arguably more of an ABI than debugfs, which
> doesn't really sound suitable for workaround.
>
> And I don't see how adding a line to /etc/rc.local is harder than adding
> a line to /etc/sysctl.conf

Adding a line is equally difficult both ways, you're right. But aren't 
most distros better equipped to manage (persist, modify, automate) sysctl 
parameters in a standardized manner?
Whereas rc.local seems more "individual need / edge case" oriented. For
instance: changes are done by editing the file, which is poorly scriptable
(unlike the sysctl command, which is a unified interface that reconciles
changes); the load order is also typically late in the boot stage, making   
it not an ideal place for settings that affect system processes.