[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab7d07ba-5dc3-95c0-aa7c-c2575d03f429@oracle.com>
Date: Thu, 18 May 2023 10:21:39 -0500
From: Mike Christie <michael.christie@...cle.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: oleg@...hat.com, linux@...mhuis.info, nicolas.dichtel@...nd.com,
axboe@...nel.dk, torvalds@...ux-foundation.org,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
brauner@...nel.org
Subject: Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if
SIGNAL_GROUP_EXIT/group_exec_task is set
On 5/17/23 10:49 PM, Eric W. Biederman wrote:
>
> Long story short.
>
> In the patch below the first hunk is a noop.
>
> The code you are bypassing was added to ensure that process termination
> (aka SIGKILL) is processed before any other signals. Other than signal
> processing order there are not any substantive differences in the two
> code paths. With all signals except SIGSTOP == 19 and SIGKILL == 9
> blocked SIGKILL should always be processed before SIGSTOP.
>
> Can you try patch with just the last hunk that does
> s/PF_IO_WORKER/PF_USER_WORKER/ and see if that is enough?
>
If I just have the last hunk and then we get SIGKILL what happens is
in code like:
vhost_worker()
schedule()
if (has IO)
handle_IO()
The schedule() calls will hit the signal_pending_state check for
signal_pending or __fatal_signal_pending and so instead of waiting
for whatever wake_up call we normally waited for we tend to just
return immediately. If you just run Qemu (the parent of the vhost_task)
and send SIGKILL then sometimes the vhost_task just spins and it
would look like the task has taken over the CPU (this is what I hit
when I tested Linus's patch).
With the first hunk of the patch, we will end up dequeuing the SIGKILL
and clearing TIF_SIGPENDING, so the vhost_task can still do some work
before it exits.
In the other patches we do:
if (get_signal(ksig))
start_exit_cleanup_by_stopping_newIO()
flush running IO()
exit()
But to do the flush running IO() part of this I need to wait for it so
that's why I wanted to be able to dequeue the SIGKILL and clear the
TIF_SIGPENDING bit.
Or I don't need this specifically. In patch 0/8 I said I knew you guys
would not like it :) If I just have a:
if (fatal_signal())
clear_fatal_signal()
then it would work for me.
Powered by blists - more mailing lists