linux-kernel - Re: [PATCH] sched/fair: Fix inaccurate h_nr_runnable accounting with delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f55e536e-80e2-40f2-bf90-9c148ef63a4d@amd.com>
Date: Mon, 20 Jan 2025 14:42:16 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
CC: <bsegall@...gle.com>, <dietmar.eggemann@....com>,
	<gautham.shenoy@....com>, <juri.lelli@...hat.com>,
	<linux-kernel@...r.kernel.org>, <mgorman@...e.de>, <mingo@...hat.com>,
	<peterz@...radead.org>, <rostedt@...dmis.org>, <swapnil.sapkal@....com>,
	<vincent.guittot@...aro.org>, <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Fix inaccurate h_nr_runnable accounting with
 delayed dequeue

Hello Madadi Vineeth Reddy,

Thank you for the review and test.

On 1/20/2025 10:36 AM, Madadi Vineeth Reddy wrote:
> Hi Prateek,
> 
>> A SCHED_WARN_ON() to inspect h_nr_runnable post its update in
>> dequeue_entities() like below:
>>
>>     cfs_rq->h_nr_runnable -= h_nr_runnable;
>>     SCHED_WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
>>
>> is consistently tripped when running wakeup intensive workloads like
>> hackbench in a cgroup.
> 
> I observed that the WARN_ON is triggered during the boot process without
> the patch, and the patch resolves the issue.
> 
> However, I was unable to trigger the WARN_ON by running hackbench in a
> cgroup without the patch. Could you please share the specific test
> scenario or configuration you used to reproduce it?

Can you try converting the SCHED_WARN_ON() to a WARN_ON() and try again.
I can consistently hit it to a point that it floods my console. With
autogroup enabled on Ubuntu, I believe it is trivial to hit this issue.

Following is the exact diff I'm using on top of tip:sched/core that
floods my console:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98ac49ce78ea..7bc2c57601b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7160,6 +7160,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
  
  		cfs_rq->h_nr_runnable -= h_nr_runnable;
  		cfs_rq->h_nr_queued -= h_nr_queued;
+		WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
  		cfs_rq->h_nr_idle -= h_nr_idle;
  
  		if (cfs_rq_is_idle(cfs_rq))
@@ -7199,6 +7200,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
  
  		cfs_rq->h_nr_runnable -= h_nr_runnable;
  		cfs_rq->h_nr_queued -= h_nr_queued;
+		WARN_ON(((int) cfs_rq->h_nr_runnable) < 0);
  		cfs_rq->h_nr_idle -= h_nr_idle;
  
  		if (cfs_rq_is_idle(cfs_rq))
--

I tested this on a 32 vCPU VM and I get similar floods.

> 
> For the boot process scenario:
> Tested-by: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>

Thanks a ton for testing!

> 
> Thanks,
> Madadi Vineeth Reddy

-- 
Thanks and Regards,
Prateek