linux-kernel - Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ab7d07ba-5dc3-95c0-aa7c-c2575d03f429@oracle.com>
Date:   Thu, 18 May 2023 10:21:39 -0500
From:   Mike Christie <michael.christie@...cle.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     oleg@...hat.com, linux@...mhuis.info, nicolas.dichtel@...nd.com,
        axboe@...nel.dk, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
        brauner@...nel.org
Subject: Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if
 SIGNAL_GROUP_EXIT/group_exec_task is set

On 5/17/23 10:49 PM, Eric W. Biederman wrote:
> 
> Long story short.
> 
> In the patch below the first hunk is a noop.
> 
> The code you are bypassing was added to ensure that process termination
> (aka SIGKILL) is processed before any other signals.  Other than signal
> processing order there are not any substantive differences in the two
> code paths.  With all signals except SIGSTOP == 19 and SIGKILL == 9
> blocked SIGKILL should always be processed before SIGSTOP.
> 
> Can you try patch with just the last hunk that does
> s/PF_IO_WORKER/PF_USER_WORKER/ and see if that is enough?
> 

If I just have the last hunk and then we get SIGKILL what happens is
in code like:

vhost_worker()

	schedule()
	if (has IO)
		handle_IO()

The schedule() calls will hit the signal_pending_state check for
signal_pending or __fatal_signal_pending and so instead of waiting
for whatever wake_up call we normally waited for we tend to just
return immediately. If you just run Qemu (the parent of the vhost_task)
and send SIGKILL then sometimes the vhost_task just spins and it
would look like the task has taken over the CPU (this is what I hit
when I tested Linus's patch).

With the first hunk of the patch, we will end up dequeuing the SIGKILL
and clearing TIF_SIGPENDING, so the vhost_task can still do some work
before it exits.

In the other patches we do:

if (get_signal(ksig))
	start_exit_cleanup_by_stopping_newIO()
	flush running IO()
	exit()

But to do the flush running IO() part of this I need to wait for it so
that's why I wanted to be able to dequeue the SIGKILL and clear the
TIF_SIGPENDING bit.

Or I don't need this specifically. In patch 0/8 I said I knew you guys
would not like it :) If I just have a:

if (fatal_signal())
	clear_fatal_signal()

then it would work for me.