linux-kernel - Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZuSL_FCfvVywCPxm@google.com>
Date: Fri, 13 Sep 2024 12:01:16 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Nicolas Saenz Julienne <nsaenz@...zon.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, pbonzini@...hat.com, 
	vkuznets@...hat.com, linux-doc@...r.kernel.org, linux-hyperv@...r.kernel.org, 
	linux-arch@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	graf@...zon.de, dwmw2@...radead.org, paul@...zon.com, mlevitsk@...hat.com, 
	jgowans@...zon.com, corbet@....net, decui@...rosoft.com, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, 
	amoorthy@...gle.com
Subject: Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL

On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> Model inactive VTL vCPUs' behaviour with a new MP state.
> 
> Inactive VTLs are in an artificial halt state. They enter into this
> state in response to invoking HvCallVtlCall, HvCallVtlReturn.
> User-space, which is VTL aware, can processes the hypercall, and set the
> vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll
> block until a wakeup event is received. The rules of what constitutes an
> event are analogous to halt's except that VTL's ignore RFLAGS.IF.
>
> When a wakeup event is registered, KVM will exit to user-space with a
> KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type.
> User-space is responsible of deciding whether the event has precedence
> over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE
> before resuming execution on it.
> 
> Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will
> return immediately to user-space.
> 
> Note that by re-using the readily available halt infrastructure in
> KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables)
> virtualisation features like the VMX preemption timer or APICv before
> blocking.

IIUC, this is a convoluted and roundabout way to let userspace check if a vCPU
has a wake event, correct?  Even by the end of the series, KVM never sets
MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as:

  KVM_SET_MP_STATE => KVM_RUN => KVM_SET_MP_STATE => KVM_RUN

The upside to this approach is that it requires minimal uAPI and very few KVM
changes, but that's about it AFAICT.  On the other hand, making this so painfully
specific feels like a missed opportunity, and unnecessarily bleeds VTL details
into KVM.

Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is also
rather bizarre since quite a bit of time has already elapsed since the vCPU first
did HvCallVtlCall/HvCallVtlReturn.  But that doesn't really have anything to do
with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcpu_block().

Why not add an ioctl() to very explicitly block until a wake event is ready?
Or probably better, a generic "wait" ioctl() that takes the wait type as an
argument.

Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's very
specifically restricted to a single caller (takes vcpu->mutex).  We could probably
actually implement it via .poll(), but I suspect that would be more confusing than
helpful.

E.g. extract the guts of vcpu_block() to a separate helper, and then wire that
up to an ioctl().

As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag?  That way,
userspace doesn't need to do a round-trip just to set a single bit.  E.g. I think
we should be able to squeeze it into "struct kvm_hyperv_exit".

Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up
HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io()
callback that blocks if some flag is set?  That would make it _much_ cleaner to
scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new
uAPI.

> @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>         if (!gif_set(svm))
>                 return true;
>
> +       /*
> +        * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
> +        * whether to block interrupts targeted at inactive VTLs.
> +        */
>         if (is_guest_mode(vcpu)) {
>                 /* As long as interrupts are being delivered...  */
>                 if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>                 if (nested_exit_on_intr(svm))
>                         return false;
>         } else {
> -               if (!svm_get_if_flag(vcpu))
> +               if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu))

Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper to handle
the RFLAGS.IF vs. idle VTL logic.  Naming will be annoying, but that's about it.

E.g. kvm_is_irq_blocked_by_rflags_if() or so.

[*] https://lore.kernel.org/lkml/20231001111313.77586-1-nsaenz@amazon.com