[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f55e536e-80e2-40f2-bf90-9c148ef63a4d@amd.com>
Date: Mon, 20 Jan 2025 14:42:16 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
CC: <bsegall@...gle.com>, <dietmar.eggemann@....com>,
<gautham.shenoy@....com>, <juri.lelli@...hat.com>,
<linux-kernel@...r.kernel.org>, <mgorman@...e.de>, <mingo@...hat.com>,
<peterz@...radead.org>, <rostedt@...dmis.org>, <swapnil.sapkal@....com>,
<vincent.guittot@...aro.org>, <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Fix inaccurate h_nr_runnable accounting with
delayed dequeue
Hello Madadi Vineeth Reddy,
Thank you for the review and test.
On 1/20/2025 10:36 AM, Madadi Vineeth Reddy wrote:
> Hi Prateek,
>
>> A SCHED_WARN_ON() to inspect h_nr_runnable post its update in
>> dequeue_entities() like below:
>>
>> cfs_rq->h_nr_runnable -= h_nr_runnable;
>> SCHED_WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
>>
>> is consistently tripped when running wakeup intensive workloads like
>> hackbench in a cgroup.
>
> I observed that the WARN_ON is triggered during the boot process without
> the patch, and the patch resolves the issue.
>
> However, I was unable to trigger the WARN_ON by running hackbench in a
> cgroup without the patch. Could you please share the specific test
> scenario or configuration you used to reproduce it?
Can you try converting the SCHED_WARN_ON() to a WARN_ON() and try again.
I can consistently hit it to a point that it floods my console. With
autogroup enabled on Ubuntu, I believe it is trivial to hit this issue.
Following is the exact diff I'm using on top of tip:sched/core that
floods my console:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98ac49ce78ea..7bc2c57601b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7160,6 +7160,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
cfs_rq->h_nr_runnable -= h_nr_runnable;
cfs_rq->h_nr_queued -= h_nr_queued;
+ WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
cfs_rq->h_nr_idle -= h_nr_idle;
if (cfs_rq_is_idle(cfs_rq))
@@ -7199,6 +7200,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
cfs_rq->h_nr_runnable -= h_nr_runnable;
cfs_rq->h_nr_queued -= h_nr_queued;
+ WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
cfs_rq->h_nr_idle -= h_nr_idle;
if (cfs_rq_is_idle(cfs_rq))
--
I tested this on a 32 vCPU VM and I get similar floods.
>
> For the boot process scenario:
> Tested-by: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Thanks a ton for testing!
>
> Thanks,
> Madadi Vineeth Reddy
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists