lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241029045749.37257-1-cpru@amazon.com>
Date: Mon, 28 Oct 2024 23:57:49 -0500
From: Cristian Prundeanu <cpru@...zon.com>
To: "Gautham R. Shenoy" <gautham.shenoy@....com>
CC: <linux-tip-commits@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	"Peter Zijlstra" <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
	<x86@...nel.org>, <linux-arm-kernel@...ts.infradead.org>, Bjoern Doebel
	<doebel@...zon.com>, Hazem Mohamed Abuelfotoh <abuehaze@...zon.com>, "Geoff
 Blake" <blakgeof@...zon.com>, Ali Saidi <alisaidi@...zon.com>, Csaba Csoma
	<csabac@...zon.com>, Benjamin Herrenschmidt <benh@...nel.crashing.org>, "K
 Prateek Nayak" <kprateek.nayak@....com>
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl

Hi Gautham,

On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@....com <mailto:gautham.shenoy@....com>> wrote:

> On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote:
> > On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote:
> > > 
> > > The hammerdb test is a bit more complex than sysbench. It uses two
> > > independent physical machines to perform a TPC-C derived test [1], aiming
> > > to simulate a real-world database workload. The machines are allocated as
> > > an AWS EC2 instance pair on the same cluster placement group [2], to avoid
> > > measuring network bottlenecks instead of server performance. The SUT
> > > instance runs mysql configured to use 2 worker threads per vCPU (32
> > > total); the load generator instance runs hammerdb configured with 64
> > > virtual users and 24 warehouses [3]. Each test consists of multiple
> > > 20-minute rounds, run consecutively on multiple independent instance
> > > pairs.
> > 
> > Would it be possible to produce something that Prateek and Gautham
> > (Hi Gautham btw !) can easily consume to reproduce ?
> > 
> > Maybe a container image or a pair of container images hammering each
> > other ? (the simpler the better).
> 
> Yes, that would be useful. Please share your recipe. We will try and
> reproduce it at our end. In our testing from a few months ago (some of
> which was presented at OSPM 2024), most of the database related
> regressions that we observed with EEVDF went away after running these
> the server threads under SCHED_BATCH.

I am working on a repro package that is self contained and as simple to 
share as possible.

My testing with SCHED_BATCH is meanwhile concluded. It did reduce the 
regression to less than half - but only with WAKEUP_PREEMPTION enabled. 
When using NO_WAKEUP_PREEMPTION, there was no performance change compared 
to SCHED_OTHER.

(At the risk of stating the obvious, using SCHED_BATCH only to get back to 
the default CFS performance is still only a workaround, just as disabling 
PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the 
root cause in EEVDF, but shouldn't be seen as viable alternate solutions.)

Do you have more detail on the database regressions you saw a few months 
ago? What was the magnitude, and which workloads did it manifest on?

-Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ