[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04f2853d-3bdb-8893-91c8-074893310d1d@oracle.com>
Date: Fri, 19 May 2023 18:24:34 -0500
From: Mike Christie <michael.christie@...cle.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Oleg Nesterov <oleg@...hat.com>, linux@...mhuis.info,
nicolas.dichtel@...nd.com, axboe@...nel.dk,
torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
brauner@...nel.org
Subject: Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if
SIGNAL_GROUP_EXIT/group_exec_task is set
On 5/18/23 11:16 PM, Eric W. Biederman wrote:
> Mike Christie <michael.christie@...cle.com> writes:
>
>> On 5/18/23 1:28 PM, Eric W. Biederman wrote:
>>> Still the big issue seems to be the way get_signal is connected into
>>> these threads so that it keeps getting called. Calling get_signal after
>>> a fatal signal has been returned happens nowhere else and even if we fix
>>> it today it is likely to lead to bugs in the future because whoever is
>>> testing and updating the code is unlikely they have a vhost test case
>>> the care about.
>>>
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index 8f6330f0e9ca..4d54718cad36 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -181,7 +181,9 @@ void recalc_sigpending_and_wake(struct task_struct *t)
>>>
>>> void recalc_sigpending(void)
>>> {
>>> - if (!recalc_sigpending_tsk(current) && !freezing(current))
>>> + if ((!recalc_sigpending_tsk(current) && !freezing(current)) ||
>>> + ((current->signal->flags & SIGNAL_GROUP_EXIT) &&
>>> + !__fatal_signal_pending(current)))
>>> clear_thread_flag(TIF_SIGPENDING);
>>>
>>> }
>>> @@ -1043,6 +1045,13 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>>> * This signal will be fatal to the whole group.
>>> */
>>> if (!sig_kernel_coredump(sig)) {
>>> + /*
>>> + * The signal is being short circuit delivered
>>> + * don't it pending.
>>> + */
>>> + if (type != PIDTYPE_PID) {
>>> + sigdelset(&t->signal->shared_pending, sig);
>>> +
>>> /*
>>> * Start a group exit and wake everybody up.
>>> * This way we don't have other threads
>>>
>>
>> If I change up your patch so the last part is moved down a bit to when we set t
>> like this:
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 0ac48c96ab04..c976a80650db 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -181,9 +181,10 @@ void recalc_sigpending_and_wake(struct task_struct *t)
>>
>> void recalc_sigpending(void)
>> {
>> - if (!recalc_sigpending_tsk(current) && !freezing(current))
>> + if ((!recalc_sigpending_tsk(current) && !freezing(current)) ||
>> + ((current->signal->flags & SIGNAL_GROUP_EXIT) &&
>> + !__fatal_signal_pending(current)))
>> clear_thread_flag(TIF_SIGPENDING);
>> -
> Can we get rid of this suggestion to recalc_sigpending. The more I look
> at it the more I am convinced it is not safe. In particular I believe
> it is incompatible with dump_interrupted() in fs/coredump.c
With your clear_thread_flag call in vhost_worker suggestion I don't need
the above chunk.
>
> The code in fs/coredump.c is the closest code we have to what you are
> trying to do with vhost_worker after the session is killed. It also
> struggles with TIF_SIGPENDING getting set.
>> }
>> EXPORT_SYMBOL(recalc_sigpending);
>>
>> @@ -1053,6 +1054,17 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>> signal->group_exit_code = sig;
>> signal->group_stop_count = 0;
>> t = p;
>> + /*
>> + * The signal is being short circuit delivered
>> + * don't it pending.
>> + */
>> + if (type != PIDTYPE_PID) {
>> + struct sigpending *pending;
>> +
>> + pending = &t->signal->shared_pending;
>> + sigdelset(&pending->signal, sig);
>> + }
>> +
>> do {
>> task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
>> sigaddset(&t->pending.signal, SIGKILL);
>>
>>
>> Then get_signal() works like how Oleg mentioned it should earlier.
>
> I am puzzled it makes a difference as t->signal and p->signal should
> point to the same thing, and in fact the code would more clearly read
> sigdelset(&signal->shared_pending, sig);
Yeah either should work. The original patch had used t before it was
set so my patch just moved it down to after we set it. I just used signal
like you wrote and it works fine.
>
> But all of that seems minor.
>
>> For vhost I just need the code below which is just Linus's patch plus a call
>> to get_signal() in vhost_worker() and the PF_IO_WORKER->PF_USER_WORKER change.
>>
>> Note that when we get SIGKILL, the vhost file_operations->release function is called via
>>
>> do_exit -> exit_files -> put_files_struct -> close_files
>>
>> and so the vhost release function starts to flush IO and stop the worker/vhost
>> task. In vhost_worker() then we just handle those last completions for already
>> running IO. When the vhost release function detects they are done it does
>> vhost_task _stop() and vhost_worker() returns and then vhost_task_fn() does do_exit().
>> So we don't return immediately when get_signal() returns non-zero.
>>
>> So it works, but it sounds like you don't like vhost relying on the behavior,
>> and it's non standard to use get_signal() like we are. So I'm not sure how we
>> want to proceed.
>
> Let me clarify my concern.
>
> Your code modifies get_signal as:
> /*
> - * PF_IO_WORKER threads will catch and exit on fatal signals
> + * PF_USER_WORKER threads will catch and exit on fatal signals
> * themselves. They have cleanup that must be performed, so
> * we cannot call do_exit() on their behalf.
> */
> - if (current->flags & PF_IO_WORKER)
> + if (current->flags & PF_USER_WORKER)
> goto out;
> /*
> * Death signals, no core dump.
> */
> do_group_exit(ksig->info.si_signo);
> /* NOTREACHED */
>
> Which means by modifying get_signal you are logically deleting the
> do_group_exit from get_signal. As far as that goes that is a perfectly
> reasonable change. The problem is you wind up calling get_signal again
> after that. That does not make sense.
>
> I would suggest doing something like:
I see. I've run some tests today and what you suggested for vhost_worker
and your signal change and it works for SIGKILL/STOP/CONT and freeze.
>
> What is the diff below? It does not appear to a revert diff.
It was just the most simple patch that was needed with your signal changes
(and the PF_IO_WORKER -> PF_USER_WORKER signal change) to fix the 2
regressions reported. I wanted to give the vhost devs an idea of what was
needed with your signal changes.
Let me do some more testing over the weekend and I'll post a RFC with your
signal change and the minimal changes needed to vhost to handle the 2
regressions that were reported. The vhost developers can get a better idea
of what needs to be done and they can better decide what they want to do to
proceed.
Powered by blists - more mailing lists