linux-kernel - Re: [CFT][PATCH v3] fork, vhost: Use CLONE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o7llycau.fsf@email.froward.int.ebiederm.org>
Date:   Sun, 11 Jun 2023 15:27:37 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Mike Christie <michael.christie@...cle.com>, linux@...mhuis.info,
        nicolas.dichtel@...nd.com, axboe@...nel.dk,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
        brauner@...nel.org
Subject: Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

Oleg Nesterov <oleg@...hat.com> writes:

> On 06/06, Mike Christie wrote:
>>
>> On 6/6/23 7:16 AM, Oleg Nesterov wrote:
>> > On 06/05, Mike Christie wrote:
>> >
>> >> So it works like if we were using a kthread still:
>> >>
>> >> 1. Userapce thread0 opens /dev/vhost-$something.
>> >> 2. thread0 does VHOST_SET_OWNER ioctl. This calls vhost_task_create() to
>> >> create the task_struct which runs the vhost_worker() function which handles
>> >> the work->fns.
>> >> 3. If userspace now does a SIGKILL or just exits without doing a close() on
>> >> /dev/vhost-$something, then when thread0 does exit_files() that will do the
>> >> fput that does vhost-$something's file_operations->release.
>> >
>> > So, at least in this simple case vhost_worker() can just exit after SIGKILL,
>> > and thread0 can flush the outstanding commands when it calls vhost_dev_flush()
>> > rather than wait for vhost_worker().
>> >
>> > Right?
>>
>> With the current code, the answer is no. We would hang like I mentioned here:
>>
>> https://lore.kernel.org/lkml/ae250076-7d55-c407-1066-86b37014c69c@oracle.com/
>
> If only I could fully understand this email ;)
>
> Could you spell to explain why this can't work (again, in this simple case) ?
>
> My current (and I know, very poor) understanding is that .release() should
> roughly do the following:
>
> 	1. Ensure that vhost_work_queue() can't add the new callbacks
>
> 	2. Call vhost_dev_flush() to ensure that worker->work_list is empty
>
> 	3. Call vhost_task_stop()


At least in the case of exec by the time the final fput happens
from close_on_exec the task has already changed it's mm.  So the
conditions are wrong to run the work queue items.

For close(2) and SIGKILL perhaps, but definitely not in the case of
exec.


Eric