linux-kernel - Re: [syzbot] [kernel?] WARNING in signal_wake_up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgM=MmqrQC-qgXoSehW=itHaqOUiBfN8jRBGAHn1=D0tg@mail.gmail.com>
Date: Tue, 9 Jan 2024 11:05:45 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: syzbot <syzbot+c6d438f2d77f96cae7c2@...kaller.appspotmail.com>, 
	Oleg Nesterov <oleg@...hat.com>, "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: linux-kernel@...r.kernel.org, luto@...nel.org, michael.christie@...cle.com, 
	mst@...hat.com, peterz@...radead.org, syzkaller-bugs@...glegroups.com, 
	tglx@...utronix.de
Subject: Re: [syzbot] [kernel?] WARNING in signal_wake_up_state

Oleg/Eric, can you make any sense of this?

On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438f2d77f96cae7c2@...kaller.appspotmail.com> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080

Hmm. This smells more like a "that triggers the problem" than a cause.

Because the warning itself is

> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771

That's

        lockdep_assert_held(&t->sighand->siglock);

at the top of the function, with the call trace being

>  signal_wake_up include/linux/sched/signal.h:448 [inline]

just a wrapper setting 'state'.

>  zap_process fs/coredump.c:373 [inline]

That's zap_process() that does a

        for_each_thread(start, t) {

and then does a

                        signal_wake_up(t, 1);

on each thread.

>  zap_threads fs/coredump.c:392 [inline]

And this is zap_threads(), which does

        spin_lock_irq(&tsk->sighand->siglock);
        ...
                nr = zap_process(tsk, exit_code);

Strange. The sighand->siglock is definitely taken.

The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a

        list_for_each_entry_rcu(..)

walking over the tsk->signal->thread_head list.

But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:

        if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
                return ERR_PTR(-EINVAL);

so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.

So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.

Is there some odd exit race? The thread is removed with

        list_del_rcu(&p->thread_node);

in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.

Thus bringing in Eric/Oleg to see if they see something I miss.

Original email at

    https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/

for your pleasure.

            Linus