linux-kernel - Re: [tip: sched/core] sched/eevdf: More PELT vs DELAYED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <30f13deb-7dda-4a17-ab88-f386377bc30b@arm.com>
Date: Wed, 27 Nov 2024 09:34:12 +0000
From: Luis Machado <luis.machado@....com>
To: K Prateek Nayak <kprateek.nayak@....com>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
 Vincent Guittot <vincent.guittot@...aro.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org
Subject: Re: [tip: sched/core] sched/eevdf: More PELT vs DELAYED_DEQUEUE

Hi,

On 11/27/24 04:17, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote:
>> The following commit has been merged into the sched/core branch of tip:
>>
>> Commit-ID:     2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>> Gitweb:        https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>> Author:        Peter Zijlstra <peterz@...radead.org>
>> AuthorDate:    Fri, 06 Sep 2024 12:45:25 +02:00
>> Committer:     Peter Zijlstra <peterz@...radead.org>
>> CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00
>>
>> sched/eevdf: More PELT vs DELAYED_DEQUEUE
>>
>> Vincent and Dietmar noted that while commit fc1892becd56 fixes the
>> entity runnable stats, it does not adjust the cfs_rq runnable stats,
>> which are based off of h_nr_running.
>>
>> Track h_nr_delayed such that we can discount those and adjust the
>> signal.
>>
>> Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE")
>> Reported-by: Dietmar Eggemann <dietmar.eggemann@....com>
>> Reported-by: Vincent Guittot <vincent.guittot@...aro.org>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>> Link: https://lkml.kernel.org/r/20240906104525.GG4928@noisy.programming.kicks-ass.net
> 
> I've been testing this fix for a while now to see if it helps the
> regressions reported around EEVDF complete. The issue with negative
> "h_nr_delayed" reported by Luis previously seem to have been fixed as a
> result of commit 75b6499024a6 ("sched/fair: Properly deactivate
> sched_delayed task upon class change")

I recall having 75b6499024a6 in my testing tree and somehow still noticing
unbalanced accounting for h_nr_delayed, where it would be decremented
twice eventually, leading to negative numbers.

I might have to give it another go if we're considering including the change
as-is, just to make sure. Since our setups are slightly different, we could
be exercising some slightly different paths.

Did this patch help with the regressions you noticed though? 

> 
> I've been running stress-ng for a while and haven't seen any cases of
> negative "h_nr_delayed". I'd also added the following WARN_ON() to see
> if there are any delayed tasks on the cfs_rq before switching to idle in
> some of my previous experiments and I did not see any splat during my
> benchmark runs.
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 621696269584..c19a31fa46c9 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
>  
>  static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
>  {
> +    /* All delayed tasks must be picked off before switching to idle */
> +    SCHED_WARN_ON(rq->cfs.h_nr_delayed);
> +
>      update_idle_core(rq);
>      scx_update_idle(rq, true);
>      schedstat_inc(rq->sched_goidle);
> -- 
> 
> If you are including this back, feel free to add:
> 
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
> 
>> [..snip..]
>