[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30f13deb-7dda-4a17-ab88-f386377bc30b@arm.com>
Date: Wed, 27 Nov 2024 09:34:12 +0000
From: Luis Machado <luis.machado@....com>
To: K Prateek Nayak <kprateek.nayak@....com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Vincent Guittot <vincent.guittot@...aro.org>, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org
Subject: Re: [tip: sched/core] sched/eevdf: More PELT vs DELAYED_DEQUEUE
Hi,
On 11/27/24 04:17, K Prateek Nayak wrote:
> Hello Peter,
>
> On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote:
>> The following commit has been merged into the sched/core branch of tip:
>>
>> Commit-ID: 2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>> Gitweb: https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>> Author: Peter Zijlstra <peterz@...radead.org>
>> AuthorDate: Fri, 06 Sep 2024 12:45:25 +02:00
>> Committer: Peter Zijlstra <peterz@...radead.org>
>> CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00
>>
>> sched/eevdf: More PELT vs DELAYED_DEQUEUE
>>
>> Vincent and Dietmar noted that while commit fc1892becd56 fixes the
>> entity runnable stats, it does not adjust the cfs_rq runnable stats,
>> which are based off of h_nr_running.
>>
>> Track h_nr_delayed such that we can discount those and adjust the
>> signal.
>>
>> Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE")
>> Reported-by: Dietmar Eggemann <dietmar.eggemann@....com>
>> Reported-by: Vincent Guittot <vincent.guittot@...aro.org>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>> Link: https://lkml.kernel.org/r/20240906104525.GG4928@noisy.programming.kicks-ass.net
>
> I've been testing this fix for a while now to see if it helps the
> regressions reported around EEVDF complete. The issue with negative
> "h_nr_delayed" reported by Luis previously seem to have been fixed as a
> result of commit 75b6499024a6 ("sched/fair: Properly deactivate
> sched_delayed task upon class change")
I recall having 75b6499024a6 in my testing tree and somehow still noticing
unbalanced accounting for h_nr_delayed, where it would be decremented
twice eventually, leading to negative numbers.
I might have to give it another go if we're considering including the change
as-is, just to make sure. Since our setups are slightly different, we could
be exercising some slightly different paths.
Did this patch help with the regressions you noticed though?
>
> I've been running stress-ng for a while and haven't seen any cases of
> negative "h_nr_delayed". I'd also added the following WARN_ON() to see
> if there are any delayed tasks on the cfs_rq before switching to idle in
> some of my previous experiments and I did not see any splat during my
> benchmark runs.
>
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 621696269584..c19a31fa46c9 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
>
> static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
> {
> + /* All delayed tasks must be picked off before switching to idle */
> + SCHED_WARN_ON(rq->cfs.h_nr_delayed);
> +
> update_idle_core(rq);
> scx_update_idle(rq, true);
> schedstat_inc(rq->sched_goidle);
> --
>
> If you are including this back, feel free to add:
>
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>
>> [..snip..]
>
Powered by blists - more mailing lists