linux-kernel - Re: [GIT PULL] KVM changes for Linux 6.14

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87plk8kj16.fsf@email.froward.int.ebiederm.org>
Date: Sun, 26 Jan 2025 21:55:17 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>,  "Michael S. Tsirkin"
 <mst@...hat.com>,  Christian Brauner <brauner@...nel.org>,  Oleg Nesterov
 <oleg@...hat.com>,  linux-kernel@...r.kernel.org,  kvm@...r.kernel.org
Subject: Re: [GIT PULL] KVM changes for Linux 6.14

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Sat, 25 Jan 2025 at 10:12, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> Arguably the user space oddity is just strange and Paolo even calls it
>> a bug, but at the same time, I do think user space can and should
>> reasonably expect that it only has children that it created
>> explicitly [..]
>
> Note that I think that doing things like "io_uring" and getting IO
> helper threads that way would very much count as "explicit children",
> so I don't argue that all kernel helper threads would fall under this
> category.
>
> And I suspect that the normal vhost workers fall under that same kind
> of "it's like io_uring". If you use VHOST_NEW_WORKER to create a
> worker thread, then that's a pretty explicit "I have a child process".
>
> So it's really just that hugepage recovery thread that seems to be a
> bit "too" much of an implicit kernel helper thread that user space
> kind of gets accidentally and implicitly just because of a kernel
> implementation detail.
>
> I'm sure the kvm hack to just start it later (at KVM_RUN time?) is
> sufficient in practice, but it still feels conceptually iffy to me.

I don't think implicit vs explicit is right question.  Rather we should
be asking can userspace care?

If I read the context from the commit correctly what userspace
is asking is:  Am I single threaded so that I know nothing funny
will happen in the forked process.

The most common funny I am aware of for forked multi-threaded processes
is that if they fork with another thread holding a lock the forked
process might hang forever on the lock because the lock will never
be released.

The most interesting part of the hugepage reaper appears to be
kvm_mmu_commit_zap_page, where a page is freed after being flushed from
the tlb.

I would argue that if kvm_mmu_commit_zap_page and friends change the
page tables in a way that userspace can see after a fork, and in turn
could affect how the forked process will execute userspace is doing
something sensible in testing for it.

On the flip side if this isn't something userspace can observe in it's
own process I would argue that the proper solution is to user a regular
kthread.

In summary the conceptually clean approach is to only have threads that
when running can effect the process they are a part of in a userspace
visible way.  Assuming the hugepage reaper can effect the process it is
a part of, the only problem I see is the hugepage reaper existing when
it had nothing it could possibly do.

I don't think hiding threads is a useful solution because the threads
will effect they process they are a part of.  If the threads aren't
effecting the process they are a part of we have other solutions besides
threads.

Eric