[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B27ECDA1-632D-44CD-AB99-B7A9C27393E4@amazon.com>
Date: Fri, 2 May 2025 17:25:14 +0000
From: "Prundeanu, Cristian" <cpru@...zon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: K Prateek Nayak <kprateek.nayak@....com>, "Mohamed Abuelfotoh, Hazem"
<abuehaze@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>, "Benjamin
Herrenschmidt" <benh@...nel.crashing.org>, "Blake, Geoff"
<blakgeof@...zon.com>, "Csoma, Csaba" <csabac@...zon.com>, "Doebel, Bjoern"
<doebel@...zon.de>, Gautham Shenoy <gautham.shenoy@....com>, Swapnil Sapkal
<swapnil.sapkal@....com>, Joseph Salisbury <joseph.salisbury@...cle.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, Borislav Petkov
<bp@...en8.de>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: EEVDF regression still exists
On 2025-04-30, 05:03, "Peter Zijlstra" <peterz@...radead.org <mailto:peterz@...radead.org>> wrote:
> Anyway, looking at the two individual reports side by side:
>
> - schedule() left the processor idle -- is up
>
> vs.
>
> - pull_task() count on cpu newly idle -- is down
> - load_balance() success count on cpu newly idle -- is down
>
> Which seem related and would suggest we look at newidle balance. One of
> the things we've seen before is that newidle was affected by the shorter
> slice of EEVDF. But it is also quite possible something changed in the
> load-balancer here.
>
> Also of note is that .15 seems to have a lower number of 'ttwu() was
> called to wake up on the local cpu' -- which I'm not quite sure how to
> rhyme with the previous observation. The newidle thing seems to suggest
> not enough migrations, while this would suggest too many migrations.
A 2x longer slice on 6.15 does improve performance some, but not by a lot.
I went back to look at my previous tests, and back in September I did try
multiple slice values (1.5ms, 3ms, 6ms, 12ms) on 6.5 and 6.6. The response
was noisy (much less on CFS however), and not linear, peaking at 3ms.
Does the lack of linearity match your expectations? Would it have reason
to change in more recent kernels?
Another, more recent observation is that 6.15-rc4 has worse performance than
rc3 and earlier kernels. Maybe that can help narrow down the cause?
I've added the perf reports for rc3 and rc2 in the same location as before.
https://github.com/aws/repro-collection/blob/main/repros/repro-mysql-EEVDF-regression/results/20250428/README.md#raw-data
Powered by blists - more mailing lists