[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cfca7764-d210-6df9-e182-2c093101c6cf@oracle.com>
Date: Mon, 22 May 2023 12:00:39 -0500
From: Mike Christie <michael.christie@...cle.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
ebiederm@...ssion.com, torvalds@...ux-foundation.org,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
regression
On 5/22/23 7:30 AM, Oleg Nesterov wrote:
> Confused, please help...
>
> On 05/21, Mike Christie wrote:
>>
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -338,6 +338,7 @@ static int vhost_worker(void *data)
>> struct vhost_worker *worker = data;
>> struct vhost_work *work, *work_next;
>> struct llist_node *node;
>> + bool dead = false;
>>
>> for (;;) {
>> /* mb paired w/ kthread_stop */
>> @@ -349,8 +350,22 @@ static int vhost_worker(void *data)
>> }
>>
>> node = llist_del_all(&worker->work_list);
>> - if (!node)
>> + if (!node) {
>> schedule();
>> + /*
>> + * When we get a SIGKILL our release function will
>> + * be called. That will stop new IOs from being queued
>> + * and check for outstanding cmd responses. It will then
>> + * call vhost_task_stop to tell us to return and exit.
>> + */
>
> But who will call the release function / vhost_task_stop() and when this
> will happen after this thread gets SIGKILL ?
When we get a SIGKILL, the thread that owns the device/vhost_task will
also exit since it's the same thread group and it does:
do_exit -> exit_files -> put_files_struct -> close_files -> fput
which eventually (the actual put is done via the delayed work path
in there) runs the file_operation->release.
So the release function is being called in parallel more or less as the
code above.
>
>> + if (!dead && signal_pending(current)) {
>> + struct ksignal ksig;
>> +
>> + dead = get_signal(&ksig);
>> + if (dead)
>> + clear_thread_flag(TIF_SIGPENDING);
>
> If you do clear_thread_flag(TIF_SIGPENDING), then why do we need 1/3 ?
You're right. I don't need it. I thought __fatal_signal_pending checked
the shared pending as well but it only checks pending.
> Also. Suppose that vhost_worker() dequeues SIGKILL and clears TIF_SIGPENDING.
>
> SIGSTOP, PTRACE_INTERRUPT, freezer can come and set TIF_SIGPENDING again.
> In this case the main for (;;) loop will spin without sleeping until
> vhost_task_should_stop() becomes true?
I see. So I either have to be able to call get_signal after SIGKILL or
at this time work like a kthread and ignore signals like a
if (dead && signal_pending())
flush_signals()
?
Powered by blists - more mailing lists