linux-kernel - Re: Debugging lost task in wait_task_inactive() when delivering signal (6.12)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGis_TWHJva-gktrsvO9=m5mEFf4zzcN=rNEt+5+moqz=C7AEQ@mail.gmail.com>
Date: Sat, 20 Sep 2025 23:10:07 +0100
From: Matt Fleming <mfleming@...udflare.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, John Stultz <jstultz@...gle.com>, 
	kernel-team <kernel-team@...udflare.com>, LKML <linux-kernel@...r.kernel.org>, 
	Chris Arges <carges@...udflare.com>
Subject: Re: Debugging lost task in wait_task_inactive() when delivering
 signal (6.12)

On Fri, 19 Sept 2025 at 17:15, Oleg Nesterov <oleg@...hat.com> wrote:
>
> OK, thanks. Nothing "interesting" at first glance.

Chris (Cc'd) and I managed to get a reproducer and I think I know
what's happening now.

When a task A gets the SIGKILL from whichever thread is handling the
coredump (let's say task B) it might hit the delayed dequeue path in
schedule() and call set_delayed(), e.g.

        dequeue_entity+1263
        dequeue_entities+216
        dequeue_task_fair+224
        __schedule+468
        schedule+39
        do_exit+221
        do_group_exit+48
        get_signal+2078
        arch_do_signal_or_restart+46
        irqentry_exit_to_user_mode+132
        asm_sysvec_apic_timer_interrupt+26

At this point task A has ->on_rq=1, ->se.sched_delayed=1 and ->se.on_rq=1.

Now when task B calls into wait_task_inactive(), it sees
->se.sched_delayed=1 and calls dequeue_task().

At this point task A has ->on_rq=1, ->se.sched_delayed=0 and ->se.on_rq=0

Unfortunately, task B still thinks that task A is scheduled because
task_on_rq_queued(A) is true, but it's not runnable and will never run
because it's no longer in the fair rbtree and the only task that will
enqueue it again is task B once it leaves wait_task_inactive() and
hits coredump_finish().

> > do_exit+0xdd is here in coredump_task_wait():
> >
> >                 for (;;) {
> >                         set_current_state(TASK_IDLE|TASK_FREEZABLE);
> >                         if (!self.task) /* see coredump_finish() */
> >                                 break;
> >                         schedule();
> >                 }
> >
> > i.e. the task calls schedule() and never comes back.
>
> Are you sure it never comes back and doesn't loop?

Yeah, positive:

$ sudo perf stat -e cycles -t 1546531 -- sleep 30

 Performance counter stats for thread id '1546531':

     <not counted>      cycles

      30.001671072 seconds time elapsed