[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55a2acefffb8c99e4234bd18656a75625447c2d0.camel@gmx.de>
Date: Tue, 01 Oct 2024 10:30:26 +0200
From: Mike Galbraith <efault@....de>
To: Vishal Chourasia <vishalc@...ux.ibm.com>, Peter Zijlstra
<peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Vincent
Guittot <vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
luis.machado@....com
Subject: Re: sched/fair: Kernel panics in pick_next_entity
On Tue, 2024-10-01 at 00:45 +0530, Vishal Chourasia wrote:
> >
> for sanity, I ran the workload (kernel compilation) on the base commit
> where the kernel panic was initially observed, which resulted in a
> kernel panic, along with it couple of warnings where also printed on the
> console, and a circular locking dependency warning with it.
>
> Kernel 6.11.0-kp-base-10547-g684a64bf32b6 on an ppc64le
>
> ------------[ cut here ]------------
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.11.0-kp-base-10547-g684a64bf32b6 #69 Not tainted
> ------------------------------------------------------
...
> --- interrupt: 900
> se->sched_delayed
> WARNING: CPU: 1 PID: 27867 at kernel/sched/fair.c:6062 unthrottle_cfs_rq+0x644/0x660
...that warning also spells eventual doom for the box, here it does
anyway, running LTPs cfs_bandwidth01 testcase and hackbench together,
box grinds to a halt in pretty short order.
With the patchlet below (submitted), I can beat on box to my hearts
content without meeting throttle/unthrottle woes.
sched: Fix sched_delayed vs cfs_bandwidth
Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios. Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Venkat Rao Bagalkote <venkat88@...ux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@...ux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@....de>
---
kernel/sched/fair.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6058,10 +6058,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
- if (se->on_rq) {
- SCHED_WARN_ON(se->sched_delayed);
+ /* Handle any unfinished DELAY_DEQUEUE business first. */
+ if (se->sched_delayed) {
+ int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
+
+ dequeue_entity(qcfs_rq, se, flags);
+ } else if (se->on_rq)
break;
- }
enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
if (cfs_rq_is_idle(group_cfs_rq(se)))
Powered by blists - more mailing lists