lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffc2e727-8fa7-4ead-aa3f-c0a9f01a1c7c@oracle.com>
Date: Thu, 11 Jan 2024 11:20:01 -0600
From: Mike Christie <michael.christie@...cle.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc: syzbot <syzbot+c6d438f2d77f96cae7c2@...kaller.appspotmail.com>,
        Oleg Nesterov <oleg@...hat.com>, linux-kernel@...r.kernel.org,
        luto@...nel.org, mst@...hat.com, peterz@...radead.org,
        syzkaller-bugs@...glegroups.com, tglx@...utronix.de
Subject: Re: [syzbot] [kernel?] WARNING in signal_wake_up_state

On 1/10/24 10:11 AM, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@...ux-foundation.org> writes:
> 
>> Oleg/Eric, can you make any sense of this?
>>
>> On Tue, 9 Jan 2024 at 10:18, syzbot
>> <syzbot+c6d438f2d77f96cae7c2@...kaller.appspotmail.com> wrote:
>>>
>>> The issue was bisected to:
>>>
>>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>>
>> Hmm. This smells more like a "that triggers the problem" than a cause.
>>
>> Because the warning itself is
>>
>>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>>
>> That's
>>
>>         lockdep_assert_held(&t->sighand->siglock);
>>
>> at the top of the function, with the call trace being
>>
>>>  signal_wake_up include/linux/sched/signal.h:448 [inline]
>>
>> just a wrapper setting 'state'.
>>
>>>  zap_process fs/coredump.c:373 [inline]
>>
>> That's zap_process() that does a
>>
>>         for_each_thread(start, t) {
>>
>> and then does a
>>
>>                         signal_wake_up(t, 1);
>>
>> on each thread.
>>
>>>  zap_threads fs/coredump.c:392 [inline]
>>
>> And this is zap_threads(), which does
>>
>>         spin_lock_irq(&tsk->sighand->siglock);
>>         ...
>>                 nr = zap_process(tsk, exit_code);
>>
>> Strange. The sighand->siglock is definitely taken.
>>
>> The for_each_thread() must be hitting a thread with a different
>> sighand, but it's basically a
>>
>>         list_for_each_entry_rcu(..)
>>
>> walking over the tsk->signal->thread_head list.
>>
>> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
>> we always require that CLONE_SIGHAND is also set:
>>
>>         if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>>                 return ERR_PTR(-EINVAL);
>>
>> so we most definitely should have the same ->sighand if we have the
>> same ->signal. And that's true very much for that vhost_task_create()
>> case too.
>>
>> So as far as I can see, that bisected commit does add a new case of
>> threaded signal handling, but in no way explains the problem.
>>
>> Is there some odd exit race? The thread is removed with
>>
>>         list_del_rcu(&p->thread_node);
>>
>> in __exit_signal -> __unhash_process(), and despite the RCU
>> annotations, all these parts seem to hold the right locks too (ie
>> sighand->siglock is held by __exit_signal too), so I don't even see
>> any delayed de-allocation issue or anything like that.
>>
>> Thus bringing in Eric/Oleg to see if they see something I miss.
> 
> I expect this would take going through the rest of the reproducer
> to see what is going on.
> Hmm.  The reproducer is in something other than C:
> 
>> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
>> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
>> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
>> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
>> ioctl$int_in(r0, 0x40000000af01, 0x0)
>> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
>> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
>> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)
> 
> If I read that correctly it is intending to put an elf32 executable into
> a memfd and then execute it.
> 
> Exec will possibly unshare SIGHAND struct if there is still a reference
> to it from somewhere else to ensure the new process has a clean one.
> 
> But all of the other threads should get shutdown by de_thread before any
> of that happens.  And de_thread should take care of the weird non-leader
> execve case as well.  So after that point the process really should
> be single threaded.  Which is why de_thread is the point of no return.
> 
> That whole interrupt comes in, and a fatal signal is processed
> scenario is confusing.
> 
> Hmm.  That weird vnet ioctl at the beginning must be what is starting
> the vhost logic.  So I guess it makes sense if the signal is received
> by the magic vhost thread.
> 
> Perhaps there is some weird vhost logic where the thread lingers.
> 
> Ugh.  I seem to remember a whole conversation about the vhost logic
> (immediately after it was merged) and how it had a bug where it exited
> differently from everything else.  I remember people figuring it was

The vhost code just wanted to wait for running IO from the vhost thread
context before exiting instead of working like io_uring which would wait
from a secondary thread.

> immediately ok, after the code was merged, and because everything had to
> be performed as root, and no one actually uses the vhost logic like
> that.  It has been long enough I thought that would have been sorted
> out by now.
> 
> Looking back to refresh my memory at the original conversation:
> https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oraclecom/
> 
> The bisect is 100% correct, and this was a known issue with that code at
> the time it was merged.
> 

I have patches for this that I've been testing. Will post to the virt list
for the next feature window.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ