lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <55a2acefffb8c99e4234bd18656a75625447c2d0.camel@gmx.de>
Date: Tue, 01 Oct 2024 10:30:26 +0200
From: Mike Galbraith <efault@....de>
To: Vishal Chourasia <vishalc@...ux.ibm.com>, Peter Zijlstra
	 <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Vincent
 Guittot <vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
 <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
 <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 luis.machado@....com
Subject: Re: sched/fair: Kernel panics in pick_next_entity

On Tue, 2024-10-01 at 00:45 +0530, Vishal Chourasia wrote:
> >
> for sanity, I ran the workload (kernel compilation) on the base commit
> where the kernel panic was initially observed, which resulted in a
> kernel panic, along with it couple of warnings where also printed on the
> console, and a circular locking dependency warning with it.
>
> Kernel 6.11.0-kp-base-10547-g684a64bf32b6 on an ppc64le
>
> ------------[ cut here ]------------
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.11.0-kp-base-10547-g684a64bf32b6 #69 Not tainted
> ------------------------------------------------------

...

> --- interrupt: 900
> se->sched_delayed
> WARNING: CPU: 1 PID: 27867 at kernel/sched/fair.c:6062 unthrottle_cfs_rq+0x644/0x660

...that warning also spells eventual doom for the box, here it does
anyway, running LTPs cfs_bandwidth01 testcase and hackbench together,
box grinds to a halt in pretty short order.

With the patchlet below (submitted), I can beat on box to my hearts
content without meeting throttle/unthrottle woes.

sched: Fix sched_delayed vs cfs_bandwidth

Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Venkat Rao Bagalkote <venkat88@...ux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@...ux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@....de>
---
 kernel/sched/fair.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6058,10 +6058,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);

-		if (se->on_rq) {
-			SCHED_WARN_ON(se->sched_delayed);
+		/* Handle any unfinished DELAY_DEQUEUE business first. */
+		if (se->sched_delayed) {
+			int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
+
+			dequeue_entity(qcfs_rq, se, flags);
+		} else if (se->on_rq)
 			break;
-		}
 		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);

 		if (cfs_rq_is_idle(group_cfs_rq(se)))


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ