lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531b4ea1-b2e2-4b48-a7cc-4ca63107aff6@amd.com>
Date: Thu, 28 Nov 2024 12:05:53 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Luis Machado <luis.machado@....com>, "Peter Zijlstra (Intel)"
	<peterz@...radead.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Vincent Guittot
	<vincent.guittot@...aro.org>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>, <linux-tip-commits@...r.kernel.org>, Saravana
 Kannan <saravanak@...gle.com>, Samuel Wu <wusamuel@...gle.com>, Android
 Kernel Team <kernel-team@...roid.com>
Subject: Re: [tip: sched/core] sched/eevdf: More PELT vs DELAYED_DEQUEUE

(+ Saravana, Samuel)

Hello Luis,

On 11/27/2024 3:04 PM, Luis Machado wrote:
> Hi,
> 
> On 11/27/24 04:17, K Prateek Nayak wrote:
>> Hello Peter,
>>
>> On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote:
>>> The following commit has been merged into the sched/core branch of tip:
>>>
>>> Commit-ID:     2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>>> Gitweb:        https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
>>> Author:        Peter Zijlstra <peterz@...radead.org>
>>> AuthorDate:    Fri, 06 Sep 2024 12:45:25 +02:00
>>> Committer:     Peter Zijlstra <peterz@...radead.org>
>>> CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00
>>>
>>> sched/eevdf: More PELT vs DELAYED_DEQUEUE
>>>
>>> Vincent and Dietmar noted that while commit fc1892becd56 fixes the
>>> entity runnable stats, it does not adjust the cfs_rq runnable stats,
>>> which are based off of h_nr_running.
>>>
>>> Track h_nr_delayed such that we can discount those and adjust the
>>> signal.
>>>
>>> Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE")
>>> Reported-by: Dietmar Eggemann <dietmar.eggemann@....com>
>>> Reported-by: Vincent Guittot <vincent.guittot@...aro.org>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>> Link: https://lkml.kernel.org/r/20240906104525.GG4928@noisy.programming.kicks-ass.net
>>
>> I've been testing this fix for a while now to see if it helps the
>> regressions reported around EEVDF complete. The issue with negative
>> "h_nr_delayed" reported by Luis previously seem to have been fixed as a
>> result of commit 75b6499024a6 ("sched/fair: Properly deactivate
>> sched_delayed task upon class change")
> 
> I recall having 75b6499024a6 in my testing tree and somehow still noticing
> unbalanced accounting for h_nr_delayed, where it would be decremented
> twice eventually, leading to negative numbers.
> 
> I might have to give it another go if we're considering including the change
> as-is, just to make sure. Since our setups are slightly different, we could
> be exercising some slightly different paths.

That would be great! Thank you :)

Now that I see, you did have Valentine's patches in your tree during
testing
https://lore.kernel.org/lkml/6df12fde-1e0d-445f-8f8a-736d11f9ee41@arm.com/
Perhaps it could be the fixup commit 98442f0ccd82 ("sched: Fix
delayed_dequeue vs switched_from_fair()") or the fact that my benchmark
didn't stress this path enough to break you as you mentioned. I would
have still expected it to hit that SCHED_WARN_ON() I had added in
set_next_task_idle() if something went sideways.

> 
> Did this patch help with the regressions you noticed though?

I believe it was Saravana who was seeing anomalies in PELT ramp-up with
DELAY_DEQUEUE. My test setup is currently not equipped to catch it but
Saravana was interested in these fixes being backported to v6.12 LTS in
https://lore.kernel.org/lkml/CAGETcx_1pZCtWiBbDmUcxEw3abF5dr=XdFCkH9zXWK75g7457w@mail.gmail.com/

I believe tracking h_nr_delayed and disregarding delayed tasks in
certain places is a necessary fix. None of the benchmarks in my test
setup seem to mind running without it but I'm also doing most of my
testing with performance governor and the PELT anomalies seem to affect
more from a PM perspective and not load balancing perspective.

> 
>>
>> I've been running stress-ng for a while and haven't seen any cases of
>> negative "h_nr_delayed". I'd also added the following WARN_ON() to see
>> if there are any delayed tasks on the cfs_rq before switching to idle in
>> some of my previous experiments and I did not see any splat during my
>> benchmark runs.
>>
>> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
>> index 621696269584..c19a31fa46c9 100644
>> --- a/kernel/sched/idle.c
>> +++ b/kernel/sched/idle.c
>> @@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
>>   
>>   static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
>>   {
>> +    /* All delayed tasks must be picked off before switching to idle */
>> +    SCHED_WARN_ON(rq->cfs.h_nr_delayed);
>> +
>>       update_idle_core(rq);
>>       scx_update_idle(rq, true);
>>       schedstat_inc(rq->sched_goidle);
>> -- 
>>
>> If you are including this back, feel free to add:
>>
>> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>>
>>> [..snip..]
>>
> 

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ