[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250921192700.GA565@redhat.com>
Date: Sun, 21 Sep 2025 21:27:01 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Matt Fleming <mfleming@...udflare.com>
Cc: Peter Zijlstra <peterz@...radead.org>, John Stultz <jstultz@...gle.com>,
kernel-team <kernel-team@...udflare.com>,
LKML <linux-kernel@...r.kernel.org>,
Chris Arges <carges@...udflare.com>
Subject: Re: Debugging lost task in wait_task_inactive() when delivering
signal (6.12)
Thanks Matt!
So I guess that this has nothing to do with coredump and wait_task_inactive()
is broken...
I am wondering if this code
/*
* If task is sched_delayed, force dequeue it, to avoid always
* hitting the tick timeout in the queued case
*/
if (p->se.sched_delayed)
dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
ia actually correct but I know nothing about the sched_delayed logic.
I will leave this to scheduler experts ;) I can't really help.
Oleg.
On 09/20, Matt Fleming wrote:
>
> On Fri, 19 Sept 2025 at 17:15, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > OK, thanks. Nothing "interesting" at first glance.
>
> Chris (Cc'd) and I managed to get a reproducer and I think I know
> what's happening now.
>
> When a task A gets the SIGKILL from whichever thread is handling the
> coredump (let's say task B) it might hit the delayed dequeue path in
> schedule() and call set_delayed(), e.g.
>
> dequeue_entity+1263
> dequeue_entities+216
> dequeue_task_fair+224
> __schedule+468
> schedule+39
> do_exit+221
> do_group_exit+48
> get_signal+2078
> arch_do_signal_or_restart+46
> irqentry_exit_to_user_mode+132
> asm_sysvec_apic_timer_interrupt+26
>
> At this point task A has ->on_rq=1, ->se.sched_delayed=1 and ->se.on_rq=1.
>
> Now when task B calls into wait_task_inactive(), it sees
> ->se.sched_delayed=1 and calls dequeue_task().
>
> At this point task A has ->on_rq=1, ->se.sched_delayed=0 and ->se.on_rq=0
>
> Unfortunately, task B still thinks that task A is scheduled because
> task_on_rq_queued(A) is true, but it's not runnable and will never run
> because it's no longer in the fair rbtree and the only task that will
> enqueue it again is task B once it leaves wait_task_inactive() and
> hits coredump_finish().
>
> > > do_exit+0xdd is here in coredump_task_wait():
> > >
> > > for (;;) {
> > > set_current_state(TASK_IDLE|TASK_FREEZABLE);
> > > if (!self.task) /* see coredump_finish() */
> > > break;
> > > schedule();
> > > }
> > >
> > > i.e. the task calls schedule() and never comes back.
> >
> > Are you sure it never comes back and doesn't loop?
>
> Yeah, positive:
>
> $ sudo perf stat -e cycles -t 1546531 -- sleep 30
>
> Performance counter stats for thread id '1546531':
>
> <not counted> cycles
>
> 30.001671072 seconds time elapsed
>
Powered by blists - more mailing lists