[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg4Wm4x9GoUk6M8BhLsrhLj4+n8jA2Kg8XUQF=kxgNL9g@mail.gmail.com>
Date: Sat, 25 Jan 2025 10:12:16 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Paolo Bonzini <pbonzini@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>,
Christian Brauner <brauner@...nel.org>, "Eric W. Biederman" <ebiederm@...ssion.com>,
Oleg Nesterov <oleg@...hat.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [GIT PULL] KVM changes for Linux 6.14
Let's bring some thread setup people in on this..
The kvm people obviously already solved their particular issue, but I
get the feeling that the kvm solution is kind of a hack that works
around a user space oddity.
For newly added people: see commit 931656b9e2ff ("kvm: defer huge page
recovery vhost task to later") and the explanation below, and this
thread on the mailing lists:
https://lore.kernel.org/all/Z2RYyagu3phDFIac@kbusch-mbp.dhcp.thefacebook.com/
Arguably the user space oddity is just strange and Paolo even calls it
a bug, but at the same time, I do think user space can and should
reasonably expect that it only has children that it created
explicitly, and the automatic reclamation thread most definitely is a
bit too implicit.
On Fri, 24 Jan 2025 at 08:38, Paolo Bonzini <pbonzini@...hat.com> wrote:
>
> * The recently introduced conversion of the NX-page reclamation kthread to
> vhost_task moved the task under the main process. The task is created as
> soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that
> didn't expect to see any child task of the VM process until it started
> creating its own userspace threads. In particular crosvm refuses to fork()
> if procfs shows any child task, so unbreak it by creating the task lazily.
> This is arguably a userspace bug, as there can be other kinds of legitimate
> worker tasks and they wouldn't impede fork(); but it's not like userspace
> has a way to distinguish kernel worker tasks right now. Should they show
> as "Kthread: 1" in proc/.../status?
So first off, let me just say that I still absolutely think that the
current "vhost workers are children of the starter" is the right
model, even if it has caused some issues because of various legacy
expectations.
But in this case I do wonder if we should hide the implicit kernel
threads from user space somehow.
Keith pinpointed the user space logic to fork_remap():
https://github.com/google/minijail/blob/main/rust/minijail/src/lib.rs#L987
and honestly, I do think it makes sense for user space to ask "am I
single-threaded" (which is presumably the thing that breaks), and the
code for that is pretty simple:
fn is_single_threaded() -> io::Result<bool> {
match count_dir_entries("/proc/self/task") {
Ok(1) => Ok(true),
Ok(_) => Ok(false),
Err(e) => Err(e),
}
}
and I really don't think user space is "wrong".
So the fact that a kernel helper thread that runs async in the
background and does random background infrastructure things that do
not really affect user space should probably simply not break this
kind of simple (and admittedly simplistic) user space logic.
Should we just add some flag to say "don't show this thread in this
context"? We obviously still want to see it for management purposes,
so it's not like the thing should be entirely invisible, but maybe
Christian / Eric / Oleg have some opinions on how to do this cleanly
in "copy_process()" or similar?
Linus
Powered by blists - more mailing lists