[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241209125601.GQ35539@noisy.programming.kicks-ass.net>
Date: Mon, 9 Dec 2024 13:56:01 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Vineeth Remanan Pillai <vineeth@...byteword.org>
Cc: Joel Fernandes <joel@...lfernandes.org>,
Ilya Maximets <i.maximets@....org>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, vineethrp@...gle.com,
shraash@...gle.com, marcel.ziswiler@...ethink.co.uk
Subject: Re: [v6.12] WARNING: at kernel/sched/deadline.c:1995
enqueue_dl_entity (task blocked for more than 28262 seconds)
On Mon, Dec 09, 2024 at 07:29:52AM -0500, Vineeth Remanan Pillai wrote:
> On Mon, Dec 9, 2024 at 5:55 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Fri, Dec 06, 2024 at 11:57:30AM -0500, Vineeth Remanan Pillai wrote:
> >
> > > I was able to reproduce this WARN_ON couple of days back with
> > > syzkaller. dlserver's dl_se gets enqueued during a update_curr while
> > > the dlserver is stopped. And subsequent dlserver start will cause a
> > > double enqueue.
> >
> > Right, I spotted that hole late last week. There is this thread:
> >
> > https://lore.kernel.org/all/20241209094941.GF21636@noisy.programming.kicks-ass.net/T/#u
> >
> > Where I just added this thunk:
> >
> > @@ -1674,6 +1679,12 @@ void dl_server_start(struct sched_dl_entity *dl_se)
> >
> > void dl_server_stop(struct sched_dl_entity *dl_se)
> > {
> > + if (current->dl_server == dl_se) {
> > + struct rq *rq = rq_of_dl_se(dl_se);
> > + trace_printk("stop fair server %d\n", cpu_of(rq));
> > + current->dl_server = NULL;
> > + }
> > +
> > if (!dl_se->dl_runtime)
> > return;
> >
> > Which was my attempt at plugging said hole. But since I do not have
> > means of reproduction, I'm not at all sure it is sufficient :/
> >
> I think I was able to get to the root cause last week. So the issue
> seems to be that dlserver is stopped in the pick_task path of dlserver
> itself when the sched_delayed is set:
> __pick_next_task
> => pick_task_dl -> server_pick_task
> => pick_task_fair
> => pick_next_entity (if (sched_delayed))
> => dequeue_entities
> => dl_server_stop
Ooh, that's where it happens.
So the scenario I had in mind was that we were doing something like:
current->state = TASK_INTERRUPTIBLE();
schedule();
deactivate_task()
dl_stop_server();
pick_next_task()
pick_next_task_fair()
sched_balance_newidle()
rq_unlock(this_rq)
at which point another CPU can take our RQ-lock and do:
try_to_wake_up()
ttwu_queue()
rq_lock()
...
activate_task()
dl_server_start()
wakeup_preempt() := check_preempt_wakeup_fair()
update_curr()
update_curr_task()
if (current->dl_server)
dl_server_update()
enqueue_dl_entity()
Which then also goes *bang*. The above can't happen if we clear
current->dl_server in dl_stop_server().
I was worried that might not be it, bcause Marcel had biscected it to
the delayed stuff, but I'd not managed to reach the pick site yet :/
> Now server_pick_task returns NULL and then we set dl_yielded and call
> update_curr_dl_se. But dl_se is already dequeued and now the code is
> confused and it does all sorts of things including setting a timer to
> enqueue it back. This ultimately leads to double enqueue when dlserver
> is started next time(based on timing of dl_server_start)
>
> I think we should not call update_curr_dl_se when the dlserver is
> dequeued. Based on this I have a small patch and it seems to solve the
> issue:
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d9d5a702f1a6..a9f3f020e421 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2419,12 +2419,18 @@ static struct task_struct *__pick_task_dl(struct rq *rq)
>
> if (dl_server(dl_se)) {
> p = dl_se->server_pick_task(dl_se);
> - if (!p) {
> + if (p) {
> + rq->dl_server = dl_se;
> + } else if (WARN_ON_ONCE(on_dl_rq(dl_se))) {
> + /*
> + * If server_pick_task returns NULL and dlserver is
> + * enqueued, we have a problem. Lets yield and do a
> + * pick again.
> + */
> dl_se->dl_yielded = 1;
> update_curr_dl_se(rq, dl_se, 0);
> goto again;
> }
> - rq->dl_server = dl_se;
> } else {
> p = dl_task_of(dl_se);
> }
Hmm.. so fundamentally that yield() makes sense, but yeah, it's lost
track of the fact that we've stopped the server and it should not
continue.
Does something like the below make sense?
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d380bffee2ef..abebeb67de4e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -664,6 +664,7 @@ struct sched_dl_entity {
unsigned int dl_non_contending : 1;
unsigned int dl_overrun : 1;
unsigned int dl_server : 1;
+ unsigned int dl_server_active : 1;
unsigned int dl_defer : 1;
unsigned int dl_defer_armed : 1;
unsigned int dl_defer_running : 1;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d9d5a702f1a6..e2b542f684db 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1647,6 +1647,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
if (!dl_se->dl_runtime)
return;
+ dl_se->dl_server_active = 1;
enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
resched_curr(dl_se->rq);
@@ -1661,6 +1662,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
hrtimer_try_to_cancel(&dl_se->dl_timer);
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
+ dl_se->dl_server_active = 0;
}
void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
@@ -2420,8 +2422,10 @@ static struct task_struct *__pick_task_dl(struct rq *rq)
if (dl_server(dl_se)) {
p = dl_se->server_pick_task(dl_se);
if (!p) {
- dl_se->dl_yielded = 1;
- update_curr_dl_se(rq, dl_se, 0);
+ if (dl_se->dl_server_active) {
+ dl_se->dl_yielded = 1;
+ update_curr_dl_se(rq, dl_se, 0);
+ }
goto again;
}
rq->dl_server = dl_se;
Powered by blists - more mailing lists