lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a171238e-d731-1c22-af72-0f7faf7f4bea@oracle.com>
Date:   Fri, 17 Dec 2021 16:08:14 -0600
From:   michael.christie@...cle.com
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     geert@...ux-m68k.org, vverma@...italocean.com, hdanton@...a.com,
        hch@...radead.org, stefanha@...hat.com, jasowang@...hat.com,
        mst@...hat.com, sgarzare@...hat.com,
        virtualization@...ts.linux-foundation.org,
        christian.brauner@...ntu.com, axboe@...nel.dk,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V6 01/10] Use copy_process in vhost layer

On 12/17/21 1:26 PM, Eric W. Biederman wrote:
> Mike Christie <michael.christie@...cle.com> writes:
> 
>> The following patches made over Linus's tree, allow the vhost layer to do
>> a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how
>> io_uring does a copy_process against its userspace app. This allows the
>> vhost layer's worker threads to inherit cgroups, namespaces, address
>> space, etc and this worker thread will also be accounted for against that
>> owner/parent process's RLIMIT_NPROC limit.
>>
>> If you are not familiar with qemu and vhost here is more detailed
>> problem description:
>>
>> Qemu will create vhost devices in the kernel which perform network, SCSI,
>> etc IO and management operations from worker threads created by the
>> kthread API. Because the kthread API does a copy_process on the kthreadd
>> thread, the vhost layer has to use kthread_use_mm to access the Qemu
>> thread's memory and cgroup_attach_task_all to add itself to the Qemu
>> thread's cgroups.
>>
>> The problem with this approach is that we then have to add new functions/
>> args/functionality for every thing we want to inherit. I started doing
>> that here:
>>
>> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!eIaEe9V8mCgGU6vyvaWTKGi3Zlnz0rgk5Y-0nsBXRbsuVZsM8lYfHr8I8GRuoLYPYrOB$ 
>>
>> for the RLIMIT_NPROC check, but it seems it might be easier to just
>> inherit everything from the beginning, becuase I'd need to do something
>> like that patch several times.
> 
> I read through the code and I don't see why you want to make these
> almost threads of a process not actually threads of that process
> (like the io_uring threads are).
> 
> As a separate process there are many things that will continue to be
> disjoint.  If the thread changes cgroups for example your new process
> won't follow.
> 
> If you want them to be actually processes with an lifetime independent
> of the creating process I expect you want to reparent them to the local
> init process.  Just so they don't confuse the process tree.  Plus init
> processes know how to handle unexpected children.
> 
> What are the semantics you are aiming for?
> 

Hi Eric,

Right now, for vhost we need the thread being created:

1. added to the caller's cgroup.
2. to share the mm struct with the caller.
3. to be accounted for under the caller's nproc rlimit value.

For 1 and 2, we have cgroup_attach_task_all and get_task_mm
already.

This patchset started with me just trying to handle #3 by modifying kthreads
like here:

https://lkml.org/lkml/2021/6/23/1234

So we can use kthreads and the existing helpers and add:

A. a ucounts version of the above patches in the link

or

B. a helper that does something like copy_process's use of
is_ucounts_overlimit and vhost can call that.

instead of this patchset.


Before we even get to the next section below, do you consider items 1 - 3
something we need an API based on copy_process for?

Do you think I should just do A or B above, or do you have another idea? If
so can we get agreement on that from everyone?

I thought my patches in that link were a little hacky in how they passed
around the user/creds info. I thought maybe it shouldn't be passed around like
that, so switched to the copy_process based approach which did everything for
me. And I thought io_uring needed something similar as us so I made it generic.

I don't have a preference. You and Christian are the experts, so I'll leave it
to you guys.


> I can see sense in generalizing some of the pieces of create_io_thread
> but I think generalizing create_io_thread itself is premature.  The code
> lives in kernel/fork.c because it is a very special thing that we want
> to keep our eyes on.
> 
> Some of your generalization makes it much more difficult to tell what
> your code is going to use because you remove hard codes that are there
> to simplify the analysis of the situation.
> 
> My gut says adding a new create_vhost_worker and putting that in
> kernel/fork.c is a lot safer and will allow much better code analysis.
> 
> If there a really are commonalities between creating a userspace process
> that runs completely in the kernel and creating an additional userspace
> thread we refactor the code and simplify things.
> 
> I am especially nervous about generalizing the io_uring code as it's
> signal handling just barely works, and any generalization will cause it
> to break.  So you are in the process of generalizing code that simply
> can not handle the general case.  That scares me

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ