linux-kernel - Re: [v6.12] WARNING: at kernel/sched/deadline.c:1995 enqueue_dl_entity (task blocked for more than 28262 seconds)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO7JXPj6_VF+T1ykwQsCmBjVhHQdpX0wJssPPRYOazJcciCCyA@mail.gmail.com>
Date: Mon, 9 Dec 2024 08:56:43 -0500
From: Vineeth Remanan Pillai <vineeth@...byteword.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Joel Fernandes <joel@...lfernandes.org>, Ilya Maximets <i.maximets@....org>, 
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, vineethrp@...gle.com, 
	shraash@...gle.com, marcel.ziswiler@...ethink.co.uk
Subject: Re: [v6.12] WARNING: at kernel/sched/deadline.c:1995
 enqueue_dl_entity (task blocked for more than 28262 seconds)

On Mon, Dec 9, 2024 at 7:56 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Dec 09, 2024 at 07:29:52AM -0500, Vineeth Remanan Pillai wrote:
> > On Mon, Dec 9, 2024 at 5:55 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Fri, Dec 06, 2024 at 11:57:30AM -0500, Vineeth Remanan Pillai wrote:
> > >
> > > > I was able to reproduce this WARN_ON couple of days back with
> > > > syzkaller. dlserver's dl_se gets enqueued during a update_curr while
> > > > the dlserver is stopped. And subsequent dlserver start will cause a
> > > > double enqueue.
> > >
> > > Right, I spotted that hole late last week. There is this thread:
> > >
> > >   https://lore.kernel.org/all/20241209094941.GF21636@noisy.programming.kicks-ass.net/T/#u
> > >
> > > Where I just added this thunk:
> > >
> > >   @@ -1674,6 +1679,12 @@ void dl_server_start(struct sched_dl_entity *dl_se)
> > >
> > >  void dl_server_stop(struct sched_dl_entity *dl_se)
> > >  {
> > > +       if (current->dl_server == dl_se) {
> > > +               struct rq *rq = rq_of_dl_se(dl_se);
> > > +               trace_printk("stop fair server %d\n", cpu_of(rq));
> > > +               current->dl_server = NULL;
> > > +       }
> > > +
> > >         if (!dl_se->dl_runtime)
> > >                 return;
> > >
> > > Which was my attempt at plugging said hole. But since I do not have
> > > means of reproduction, I'm not at all sure it is sufficient :/
> > >
> > I think I was able to get to the root cause last week. So the issue
> > seems to be that dlserver is stopped in the pick_task path of dlserver
> > itself when the sched_delayed is set:
> > __pick_next_task
> > => pick_task_dl -> server_pick_task
> >      => pick_task_fair
> >           => pick_next_entity (if (sched_delayed))
> >                => dequeue_entities
> >                      => dl_server_stop
>
> Ooh, that's where it happens.
>
> So the scenario I had in mind was that we were doing something like:
>
>         current->state = TASK_INTERRUPTIBLE();
>         schedule();
>           deactivate_task()
>             dl_stop_server();
>           pick_next_task()
>             pick_next_task_fair()
>               sched_balance_newidle()
>                 rq_unlock(this_rq)
>
> at which point another CPU can take our RQ-lock and do:
>
>         try_to_wake_up()
>           ttwu_queue()
>             rq_lock()
>             ...
>             activate_task()
>               dl_server_start()
>             wakeup_preempt() := check_preempt_wakeup_fair()
>               update_curr()
>                 update_curr_task()
>                   if (current->dl_server)
>                     dl_server_update()
>                       enqueue_dl_entity()
>
>
> Which then also goes *bang*. The above can't happen if we clear
> current->dl_server in dl_stop_server().
>
I also thought this could be a possibility but the previous deactivate
for this task would have cleared the dl_server no? Soon after this in
update_curr() we again call dl_server_update if p_.dl_server !=
rq->fair_server and this is also another possibility of a double
enqueue. Thats the reason I thought we should have a flag to denote if
dl_server is active or not. I initially had a fix as you suggested
below. But it was not fully fixing the issue because the dl_yield was
confusing the server. So I split into 2 patches with dl_server active
flag as the second patch.

> I was worried that might not be it, bcause Marcel had biscected it to
> the delayed stuff, but I'd not managed to reach the pick site yet :/
>
> ...
> ....
> Hmm.. so fundamentally that yield() makes sense, but yeah, it's lost
> track of the fact that we've stopped the server and it should not
> continue.
>
> Does something like the below make sense?
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d380bffee2ef..abebeb67de4e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -664,6 +664,7 @@ struct sched_dl_entity {
>         unsigned int                    dl_non_contending : 1;
>         unsigned int                    dl_overrun        : 1;
>         unsigned int                    dl_server         : 1;
> +       unsigned int                    dl_server_active  : 1;
>         unsigned int                    dl_defer          : 1;
>         unsigned int                    dl_defer_armed    : 1;
>         unsigned int                    dl_defer_running  : 1;
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d9d5a702f1a6..e2b542f684db 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1647,6 +1647,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
>         if (!dl_se->dl_runtime)
>                 return;
>
> +       dl_se->dl_server_active = 1;
>         enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
>         if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
>                 resched_curr(dl_se->rq);
> @@ -1661,6 +1662,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
>         hrtimer_try_to_cancel(&dl_se->dl_timer);
>         dl_se->dl_defer_armed = 0;
>         dl_se->dl_throttled = 0;
> +       dl_se->dl_server_active = 0;
>  }
>
>  void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
> @@ -2420,8 +2422,10 @@ static struct task_struct *__pick_task_dl(struct rq *rq)
>         if (dl_server(dl_se)) {
>                 p = dl_se->server_pick_task(dl_se);
>                 if (!p) {
> -                       dl_se->dl_yielded = 1;
> -                       update_curr_dl_se(rq, dl_se, 0);
> +                       if (dl_se->dl_server_active) {
> +                               dl_se->dl_yielded = 1;
> +                               update_curr_dl_se(rq, dl_se, 0);
> +                       }
>                         goto again;
>                 }
>                 rq->dl_server = dl_se;

This should work as well. I was planning to send a second patch with
the dl_server active flag as it was not strictly the root cause of
this. But the active flag serves the purpose here and this change
looks good to me :-). I will test this on my end and let you know. It
takes more than 12 hours to reproduce in my test case ;-)

I feel that p should never be NULL when dl_server is active and that
should be a bug. From going through the code, I think we should never
hit this and a WARN_ON_ONCE would be good. What do you think about it?

Thanks,
Vineeth