linux-kernel - Re: KVM exit to userspace on WFI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANi1PHiXFKz6VqBys=Xw=rgR_gJKPO661iW-TMrF20j5yS4unQ@mail.gmail.com>
Date:   Tue, 31 Oct 2023 20:21:16 +0100
From:   Jan Henrik Weinstock <jan@....re>
To:     Marc Zyngier <maz@...nel.org>
Cc:     oliver.upton@...ux.dev, james.morse@....com,
        suzuki.poulose@....com, yuzenghui@...wei.com,
        catalin.marinas@....com, will@...nel.org,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
        linux-kernel@...r.kernel.org,
        Lukas Jünger <lukas@....re>
Subject: Re: KVM exit to userspace on WFI

Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@...nel.org>:
>
> [please make an effort not to top-post]
>
> On Fri, 27 Oct 2023 18:41:44 +0100,
> Jan Henrik Weinstock <jan@....re> wrote:
> >
> > Hi Marc,
> >
> > the basic idea behind this is to have a (single-threaded) execution loop,
> > something like this:
> >
> > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> >                          ^
> >                   WFX or timeout
> >
> > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > a certain budget of instructions (counted via pmu). Our fallback currently is
> > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> >
> > I understand that the proposed behavior is not desirable for most use cases,
> > which is why I suggest locking it behind a flag, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
>
> But how do you reconcile the fact that exposing this to userspace
> breaks fundamental expectations that the guest has, such as getting
> its timer interrupts and directly injected LPIs? Implementing WFI in
> userspace breaks it. What about the case where we don't trap WFx and
> let the *guest* wait for an interrupt?

Timer interrupts etc. will be injected into the vcpu during the
io-phases. When there are no interrupts present and the guest performs
a WFI, we can just skip forward to the next timer event.

> Honestly, what you are describing seems to be a use model that doesn't
> fit KVM, which is a general purpose hypervisor, but more a simulation
> environment. Yes, the primitives are the same, but the plumbing is
> wildly different.

Agreed.

> *If* that's the stuff you're looking at, then I'm afraid you'll have
> to do it in different way, because what you are suggesting is
> fundamentally incompatible with the guarantees that KVM gives to guest
> and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> lie. It should really be named something more along the lines of
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> (probably with additional clauses related to breaking things).

I have attached a reworked version of the patch as a reference (based
on my 5.15 kernel). It puts the modified behavior behind a new
capability so as to not interfere with the current expectations
towards handling WFI/WFE.
I think it should now trap all blocking calls to WFx on the vcpu and
reliably return to the userspace. If I have missed something that
would cause the vcpu to not trap on a WFI kindly let me know.

> Overall, you are still asking for something that is not guaranteed at
> the architecture level, even less in KVM, and I'm not going to add
> support for something that can only work "sometime".

I am not quite sure what you mean with "sometime". Are you referring
to WFIs as NOPs? Or WFIs that do not yield because of pending
interrupts?

The point of my patch is not to accurately count every single WFI. The
point is to prevent the host cpu from sleeping just because my vcpu
executed a WFI somewhere in the guest software. If a WFI is executed
by the guest and that does not result in my vcpu thread to block (in
other words: the vcpu continues executing instructions beyond the WFI)
then it also should not exit to userspace. So instead of
"KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

>         M.
>
> --
> Without deviation from the norm, progress is not possible.

-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

View attachment "kvm.patch" of type "text/x-patch" (3360 bytes)