linux-kernel - Re: EEVDF regression still exists

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6a83c7fb-dbfa-49df-be8b-f1257ad1a47a@amd.com>
Date: Sat, 3 May 2025 09:04:28 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Linus Torvalds <torvalds@...ux-foundation.org>, "Prundeanu, Cristian"
	<cpru@...zon.com>
CC: Peter Zijlstra <peterz@...radead.org>, "Mohamed Abuelfotoh, Hazem"
	<abuehaze@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>, "Benjamin
 Herrenschmidt" <benh@...nel.crashing.org>, "Blake, Geoff"
	<blakgeof@...zon.com>, "Csoma, Csaba" <csabac@...zon.com>, "Doebel, Bjoern"
	<doebel@...zon.de>, Gautham Shenoy <gautham.shenoy@....com>, Swapnil Sapkal
	<swapnil.sapkal@....com>, Joseph Salisbury <joseph.salisbury@...cle.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-tip-commits@...r.kernel.org"
	<linux-tip-commits@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: EEVDF regression still exists

Hello Linus,

On 5/2/2025 11:22 PM, Linus Torvalds wrote:
> On Fri, 2 May 2025 at 10:25, Prundeanu, Cristian <cpru@...zon.com> wrote:
>>
>> Another, more recent observation is that 6.15-rc4 has worse performance than
>> rc3 and earlier kernels. Maybe that can help narrow down the cause?
>> I've added the perf reports for rc3 and rc2 in the same location as before.
> 
> The only _scheduler_ change that looks relevant is commit bbce3de72be5
> ("sched/eevdf: Fix se->slice being set to U64_MAX and resulting
> crash"). Which does affect the slice calculation, although supposedly
> only under special circumstances.> 
> Of course, it could be something else.

Since it is the only !SCHED_EXT change in kernel/sched, Cristian can
perhaps try reverting it on top of v6.15-rc4 and checking if the
benchmark results jump back to v6.15-rc3 level to rule that single
change out. Very likely it could be something else.

> 
> For example, we have a AMD performance regression in general due to
> _another_ CPU leak mitigation issue, but that predates rc3 (happened
> during the merge window), so that one isn't relevant, but maybe
> something else is..
> 
> Although honestly, that slice calculation still looks just plain odd.
> It defaults the slice to zero, so if none of the 'break' conditions in
> the first loop happens, it will reset the slice to that zero value and

I believe setting slice to U64_MAX was the actual problem. Previously,
when the slice was initialized as:

       cfs_rq = group_cfs_rq(se);
       slice = cfs_rq_min_slice(cfs_rq);

If the "se" was delayed, it basically means that the group_cfs_rq() had
no tasks on it and cfs_rq_min_slice() would return "~0ULL" which will
get propagated and can lead to bad math.

> then the
> 
>          slice = cfs_rq_min_slice(cfs_rq);
> 
> ion that second loop looks like it might just pick up that zero value again.

If the first loop does not break, even for "if (cfs_rq->load.weight)",
it basically means that there are no tasks / delayed entities queued
all the way until root cfs_rq so the slices shouldn't matter.

Enqueue of the next task will correct the slices for the queued
hierarchy.

> 
> I clearly don't understand the code.
> 
>               Linus

-- 
Thanks and Regards,
Prateek