linux-kernel - Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZyifxfSV8k5vC0iG@BLRRASHENOY1.amd.com>
Date: Mon, 4 Nov 2024 15:49:49 +0530
From: "Gautham R. Shenoy" <gautham.shenoy@....com>
To: Cristian Prundeanu <cpru@...zon.com>
Cc: linux-tip-commits@...r.kernel.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	linux-arm-kernel@...ts.infradead.org,
	Bjoern Doebel <doebel@...zon.com>,
	Hazem Mohamed Abuelfotoh <abuehaze@...zon.com>,
	Geoff Blake <blakgeof@...zon.com>, Ali Saidi <alisaidi@...zon.com>,
	Csaba Csoma <csabac@...zon.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and
 RUN_TO_PARITY and move them to sysctl

On Mon, Oct 28, 2024 at 11:57:49PM -0500, Cristian Prundeanu wrote:
> Hi Gautham,
> 
> On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@....com <mailto:gautham.shenoy@....com>> wrote:
> 
> > On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote:
> > > On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote:
> > > > 
> > > > The hammerdb test is a bit more complex than sysbench. It uses two
> > > > independent physical machines to perform a TPC-C derived test [1], aiming
> > > > to simulate a real-world database workload. The machines are allocated as
> > > > an AWS EC2 instance pair on the same cluster placement group [2], to avoid
> > > > measuring network bottlenecks instead of server performance. The SUT
> > > > instance runs mysql configured to use 2 worker threads per vCPU (32
> > > > total); the load generator instance runs hammerdb configured with 64
> > > > virtual users and 24 warehouses [3]. Each test consists of multiple
> > > > 20-minute rounds, run consecutively on multiple independent instance
> > > > pairs.
> > > 
> > > Would it be possible to produce something that Prateek and Gautham
> > > (Hi Gautham btw !) can easily consume to reproduce ?
> > > 
> > > Maybe a container image or a pair of container images hammering each
> > > other ? (the simpler the better).
> > 
> > Yes, that would be useful. Please share your recipe. We will try and
> > reproduce it at our end. In our testing from a few months ago (some of
> > which was presented at OSPM 2024), most of the database related
> > regressions that we observed with EEVDF went away after running these
> > the server threads under SCHED_BATCH.
> 
> I am working on a repro package that is self contained and as simple to 
> share as possible.

Sorry for the delay in response. I was away for the Diwali festival.
Thank you for working on the repro package.


> 
> My testing with SCHED_BATCH is meanwhile concluded. It did reduce the 
> regression to less than half - but only with WAKEUP_PREEMPTION enabled. 
> When using NO_WAKEUP_PREEMPTION, there was no performance change compared 
> to SCHED_OTHER.
> 
> (At the risk of stating the obvious, using SCHED_BATCH only to get back to 
> the default CFS performance is still only a workaround, just as disabling 
> PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the 
> root cause in EEVDF, but shouldn't be seen as viable alternate solutions.)
> 
> Do you have more detail on the database regressions you saw a few months 
> ago? What was the magnitude, and which workloads did it manifest on?


There were three variants of sysbench + MySQL which showed regression
with EEVDF.

1. 1 Table, 10M Rows, read-only queries.
2. 3 Tables, 10M Rows each, read-only queries.
3. 1 Segmented Table, 10M Rows, read-only queries.

These saw regressions in the range of 9-12%.

The other database workload which showed regression was MongoDB + YCSB
workload c. There the magnitude of the regression was around 17%.

As mentioned by Dietmar, we observed these regressions to go away with
the original EEVDF complete patches which had a feature called
RESPECT_SLICE which allowed a running task to run till its slice gets
over without being preempted by a newly woken up task.

However, Peter suggested exploring SCHED_BATCH which fixed the
regression even without EEVDF complete patchset.

> 
> -Cristian

--
Thanks and Regards
gautham.