[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d875adc0-744e-4b1f-a1bf-7e051298a0ae@amd.com>
Date: Fri, 2 May 2025 11:26:00 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: "Prundeanu, Cristian" <cpru@...zon.com>, Peter Zijlstra
<peterz@...radead.org>
CC: "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>, "Saidi, Ali"
<alisaidi@...zon.com>, Benjamin Herrenschmidt <benh@...nel.crashing.org>,
"Blake, Geoff" <blakgeof@...zon.com>, "Csoma, Csaba" <csabac@...zon.com>,
"Doebel, Bjoern" <doebel@...zon.de>, Gautham Shenoy <gautham.shenoy@....com>,
Swapnil Sapkal <swapnil.sapkal@....com>, Joseph Salisbury
<joseph.salisbury@...cle.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: EEVDF regression still exists
Hello Cristian,
On 5/1/2025 9:46 PM, Prundeanu, Cristian wrote:
> Hi Prateek,
>
> On 2025-04-29, 22:33, "K Prateek Nayak" <kprateek.nayak@....com <mailto:kprateek.nayak@....com>> wrote:
>
>>>>> Here are the latest results for the EEVDF impact on database workloads.
>>>>> The regression introduced in kernel 6.6 still persists and doesn't look
>>>>> like it is improving.
>>>>
>>>> Well, I was under the impression it had actually been solved :-(
>>>>
>>>> My understanding from the last round was that Prateek and co had it
>>>> sorted -- with the caveat being that you had to stick SCHED_BATCH in at
>>>> the right place in MySQL start scripts or somesuch.
>>>
>>> The statement in the previous thread [1] was that using SCHED_BATCH improves
>>> performance over default. While that still holds true, it is also equally true
>>> about using SCHED_BATCH on kernel 6.5.
>>>
>>> So, when we compare 6.5 with recent kernels, both using SCHED_BATCH, the
>>> regression is still visible. (Previously, we only compared SCHED_BATCH with
>>> 6.5 default, leading to the wrong conclusion that it's a fix).
>>
>> P.S. Are the numbers for v6.15-rc4 + SCHED_BATCH comparable to v6.5
>> default?
>
> SCHED_BATCH does improve the performance both on 6.5 and on 6.12+; in my
> testing, 6.12-SCHED_BATCH does not quite reach the 6.5-default (without
> SCHED_BATCH) performance. Best case (6.15-rc3-SCHED_BATCH) is -3.6%, and
> worst case (6.15-rc4-SCHED_BATCH) is -7.0% when compared to 6.5.13-default.
>
> (Please keep in mind that the target isn't to get SCHED_BATCH to the same
> level as 6.5-default; it's to resolve the regression from 6.5-default to
> 6.6+ default, and from 6.5-SCHED_BATCH to 6.6+ SCHED_BATCH).
Ack! I was just curious if all of the performance drop can be
attributed to aggressive wakeup preemption or not.
>
>> One more curious question: Does changing the base slice to a larger
>> value (say 6ms) in conjunction with setting SCHED_BATCH on v6.15-rc4
>> affect the benchmark result in any way?
>
> I reran 6.15-rc4, with both 3ms (default) and 6ms. The larger base slice
> slightly improves performance, more for SCHED_BATCH than for default.
>
> 6ms compared to 3ms same kernel (not compared to 6.5):
>
> Kernel | Throughput | Latency
> ---------------------+------------+---------
> 6.15-rc4 default | +1.1% | -1.3%
> 6.15-rc4 SCHED_BATCH | +2.9% | -2.7%
>
> Full details, reports and data:
> https://github.com/aws/repro-collection/blob/main/repros/repro-mysql-EEVDF-regression/results/20250430/README.md
> (These perf files all have the same schedstat version, hopefully "perf
> sched stats diff" worked better this time).
Thank you for the information. Ravi and Swapnil are working to
get perf sched stats diff to behave well when comparing different
versions. It should be fixed in subsequent versions.
P.S. I'm still setting up the system and have got my SUT pretty
close to what you have described. I couldn't quite reproduce the
regression on baremetal with my previous configuration on v6.15-rc4.
Could you also provide some information on your LDG machine - its
configuration and he kernel it is running (although this shouldn't
really matter as long as it is same across runs)
>
> -Cristian
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists