linux-kernel - Re: Debugging lost task in wait_task_inactive() when delivering signal (6.12)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20250921192700.GA565@redhat.com>
Date: Sun, 21 Sep 2025 21:27:01 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Matt Fleming <mfleming@...udflare.com>
Cc: Peter Zijlstra <peterz@...radead.org>, John Stultz <jstultz@...gle.com>,
	kernel-team <kernel-team@...udflare.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Chris Arges <carges@...udflare.com>
Subject: Re: Debugging lost task in wait_task_inactive() when delivering
 signal (6.12)

Thanks Matt!

So I guess that this has nothing to do with coredump and wait_task_inactive()
is broken...

I am wondering if this code

		/*
		 * If task is sched_delayed, force dequeue it, to avoid always
		 * hitting the tick timeout in the queued case
		 */
		if (p->se.sched_delayed)
			dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED);

ia actually correct but I know nothing about the sched_delayed logic.

I will leave this to scheduler experts ;) I can't really help.

Oleg.

On 09/20, Matt Fleming wrote:
>
> On Fri, 19 Sept 2025 at 17:15, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > OK, thanks. Nothing "interesting" at first glance.
>
> Chris (Cc'd) and I managed to get a reproducer and I think I know
> what's happening now.
>
> When a task A gets the SIGKILL from whichever thread is handling the
> coredump (let's say task B) it might hit the delayed dequeue path in
> schedule() and call set_delayed(), e.g.
>
>         dequeue_entity+1263
>         dequeue_entities+216
>         dequeue_task_fair+224
>         __schedule+468
>         schedule+39
>         do_exit+221
>         do_group_exit+48
>         get_signal+2078
>         arch_do_signal_or_restart+46
>         irqentry_exit_to_user_mode+132
>         asm_sysvec_apic_timer_interrupt+26
>
> At this point task A has ->on_rq=1, ->se.sched_delayed=1 and ->se.on_rq=1.
>
> Now when task B calls into wait_task_inactive(), it sees
> ->se.sched_delayed=1 and calls dequeue_task().
>
> At this point task A has ->on_rq=1, ->se.sched_delayed=0 and ->se.on_rq=0
>
> Unfortunately, task B still thinks that task A is scheduled because
> task_on_rq_queued(A) is true, but it's not runnable and will never run
> because it's no longer in the fair rbtree and the only task that will
> enqueue it again is task B once it leaves wait_task_inactive() and
> hits coredump_finish().
>
> > > do_exit+0xdd is here in coredump_task_wait():
> > >
> > >                 for (;;) {
> > >                         set_current_state(TASK_IDLE|TASK_FREEZABLE);
> > >                         if (!self.task) /* see coredump_finish() */
> > >                                 break;
> > >                         schedule();
> > >                 }
> > >
> > > i.e. the task calls schedule() and never comes back.
> >
> > Are you sure it never comes back and doesn't loop?
>
> Yeah, positive:
>
> $ sudo perf stat -e cycles -t 1546531 -- sleep 30
>
>  Performance counter stats for thread id '1546531':
>
>      <not counted>      cycles
>
>       30.001671072 seconds time elapsed
>