[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj4DS=2F5mW+K2P7cVqrsuGd3rKE_2k2BqnnPeeYhUCvg@mail.gmail.com>
Date: Thu, 25 May 2023 09:20:19 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Oleg Nesterov <oleg@...hat.com>,
Mike Christie <michael.christie@...cle.com>,
linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
On Thu, May 25, 2023 at 8:30 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> Basically with no patches that is where Linus's kernel is.
>
> User complained about the new thread showing up in ps.
Well, not only that, but it actively broke existing workflows for
managing things. Showing up in 'ps' wasn't just some purely cosmetic
issue, but had semantic meaning.
And honestly, I think the core issue is that we should just make this
work. Kernel threads randomly switching to user memory threads was
wrong, so CLONE_VM is absolutely the right thing to do.
But while "CLONE_VM without real threading" is a very traditional
thing in Linux - it was the original model for clone(), after all - I
don't believe it is the *correct* model. There was a very real reason
clone() has grown CLONE_THREAD and friends.
So honestly, I really think we want to complete the vhost move to
CLONE_THREAD (and thus CLONE_SIGNAL).
Not because the old kthread model didn't _work_, but because it's
really really wrong to try to randomly take on user-space attributes
at run-time.
And once you do the "user threads in kernel space" model, at that
point you really do want to act like a proper thread. Both because of
that 'ps' issue (which is really just "show the world what your
relationship is), but simply because that is the modern threading
model that we use for everything else, and special cases are bad.
So I'd really like to finish this. Even if we end up with a hack or
two in signal handling that we can hopefully fix up later by having
vhost fix up some of its current assumptions.
It has worked wonderfully well for io_uring - but we *did* have quite
a bit of conversion patches over some time as people found issues.
Which is why I don't expect the vhost conevrsion to be entirely
pain-free either, and I don't think we necessarily have to get to a
"perfect and clean" state immediately, just a "working and
conceptually in the right direction" state.
Linus
Powered by blists - more mailing lists