[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250205-bauhof-fraktionslos-b1bedfe50db2@brauner>
Date: Wed, 5 Feb 2025 12:49:30 +0100
From: Christian Brauner <brauner@...nel.org>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Oleg Nesterov <oleg@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, "Michael S. Tsirkin" <mst@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>, linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [GIT PULL] KVM changes for Linux 6.14
On Tue, Feb 04, 2025 at 05:05:06PM +0100, Paolo Bonzini wrote:
> On Tue, Feb 4, 2025 at 3:19 PM Christian Brauner <brauner@...nel.org> wrote:
> >
> > On Mon, Jan 27, 2025 at 04:15:01PM +0100, Paolo Bonzini wrote:
> > > On Mon, Jan 27, 2025 at 3:10 PM Oleg Nesterov <oleg@...hat.com> wrote:
> > > > On 01/26, Linus Torvalds wrote:
> > > > > On Sun, 26 Jan 2025 at 10:54, Oleg Nesterov <oleg@...hat.com> wrote:
> > > > > >
> > > > > > I don't think we even need to detect the /proc/self/ or /proc/self-thread/
> > > > > > case, next_tid() can just check same_thread_group,
> > > > >
> > > > > That was my thinking yes.
> > > > >
> > > > > If we exclude them from /proc/*/task entirely, I'd worry that it would
> > > > > hide it from some management tool and be used for nefarious purposes
> > > >
> > > > Agreed,
> > > >
> > > > > (even if they then show up elsewhere that the tool wouldn't look at).
> > > >
> > > > Even if we move them from /proc/*/task to /proc ?
> > >
> > > Indeed---as long as they show up somewhere, it's not worse than it
> > > used to be. The reason why I'd prefer them to stay in /proc/*/task is
> > > that moving them away at least partly negates the benefits of the
> > > "workers are children of the starter" model. For example it
> > > complicates measuring their cost within the process that runs the VM.
> > > Maybe it's more of a romantic thing than a real practical issue,
> > > because in the real world resource accounting for VMs is done via
> > > cgroups. But unlike the lazy creation in KVM, which is overall pretty
> > > self-contained, I am afraid the ugliness in procfs would be much worse
> > > compared to the benefit, if there's a benefit at all.
> > >
> > > > Perhaps, I honestly do not know what will/can confuse userspace more.
> > >
> > > At the very least, marking workers as "Kthread: 1" makes sense and
You mean in /proc/<pid>/status? Yeah, we can do that. This expands the
definition of Kthread a bit. It would now mean anything that the kernel
spawned for userspace. But that is probably fine.
But it won't help with the problem of just checking /proc/<pid>/task/ to
figure out whether the caller is single-threaded or not. If the caller
has more than 1 entry in there they need to walk through all of them and
parse through /proc/<pid>/status to discount them if they're kernel
threads.
> > > should not cause too much confusion. I wouldn't go beyond that unless
> > > we get more reports of similar issues, and I'm not even sure how
> > > common it is for userspace libraries to check for single-threadedness.
> >
> > Sorry, just saw this thread now.
> >
> > What if we did what Linus suggests and hide (odd) user workers from
> > /proc/<pid>/task/* but also added /proc/<pid>/workers/*. The latter
> > would only list the workers that got spawned by the kernel for that
> > particular task? This would acknowledge their somewhat special status
> > and allow userspace to still detect them as "Hey, I didn't actually
> > spawn those, they got shoved into my workload by the kernel for me.".
>
> Wouldn't the workers then disappear completely from ps, top or other
> tools that look at /proc/$PID/task? That seems a bit too underhanded
> towards userspace...
So maybe, but then there's also the possibility to do:
- Have /proc/<pid>/status list all tasks.
- Have /proc/<pid>/worker only list user workers spawned by the kernel for userspace.
count(/proc/<pid>/status) - count(/proc/<pid>/workers) == 1 => (userspace) single threaded
My wider point is that I would prefer we add something that is
consistent and doesn't have to give the caller a different view than a
foreign task. I think that will just create confusion in the long run.
Btw, checking whether single-threaded this way:
fn is_single_threaded() -> io::Result<bool> {
match count_dir_entries("/proc/self/task") {
Ok(1) => Ok(true),
Ok(_) => Ok(false),
Err(e) => Err(e),
}
}
can be simplified. It should be sufficient to do:
stat("/proc/self/task", &st);
if ((st->st_nlink - 2) == 1)
// single threaded
Since procfs adds the number of tasks to st_nlink (Which is a bit weird
given that /proc/<pid>/fd puts the number of file descriptors in
st->st_size.).
Powered by blists - more mailing lists