linux-kernel - Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c6e9a5db-798f-fa40-5ae2-a41f2d8ebab5@oracle.com>
Date:   Thu, 25 May 2023 11:15:59 -0500
From:   Mike Christie <michael.christie@...cle.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
        ebiederm@...ssion.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
        brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

On 5/23/23 7:15 AM, Oleg Nesterov wrote:
> 
> Now the main question. Whatever we do, SIGKILL/SIGSTOP/etc can come right
> before we call work->fn(). Is it "safe" to run this callback with
> signal_pending() or fatal_signal_pending() ?

The questions before this one I'll leave for the core vhost devs since
they know best.

For this question and the one below, when we get SIGKILL we think it's ok
to fail the operation as in it's ok to not execute it like normal (send
it down to the net/target/scsi/block layers for execution). However, we
need to do some processing of the work to release mem, refcounts, etc.

For example we use the vhost_task to handle completions from the layers
we interact with. So when we get a SIGKILL, we could have N commands in
the target/block/scsi/nvme layers below the vhost layer. When those
complete, the current code assumes we have the vhost_task to handle the
responses. So for pending works on that work_list, we can pretty easily
kill them. However, we don't have a way to kill those outstanding
requests to some other layer, so we have to wait and handle them
somewhere.

If you are saying that once we get SIGKILL then we just can't use the
task anymore and we have to drop down to workqueue/kthread or change up
the completion paths so they can execute in the context they are completed
from by lower levels, then I can code this. On the vhost side, it's just
really ugly vs the code we have now that used to use kthreads (or in
6.4-rc looks like a process) so we couldn't get signals and just had the
one code path that removes us.

>From the vhost point of view signals are not useful to us and it's just
adding complexity for something that I don't think is useful to users.
However after discussing this with guys for a week and going over the
signal code, I can see your point of views where you guys are thinking its
ugly for the signal code to try and support what we want :)

> 
> 
> Finally. I never looked into drivers/vhost/ before so I don't understand
> this code at all, but let me ask anyway... Can we change vhost_dev_flush()
> to run the pending callbacks rather than wait for vhost_worker() ?
> I guess we can't, ->mm won't be correct, but can you confirm?
> 
> Oleg.
>