linux-kernel - Re: [PATCH v7 8/8] KVM: VMX: enable IPI virtualization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8f753b0-1b57-3e42-3516-27cc0359c584@intel.com>
Date:   Sat, 9 Apr 2022 00:41:03 +0800
From:   Zeng Guang <guang.zeng@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Kim Phillips <kim.phillips@....com>,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Jethro Beekman <jethro@...tanix.com>,
        "Huang, Kai" <kai.huang@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Hu, Robert" <robert.hu@...el.com>,
        "Gao, Chao" <chao.gao@...el.com>
Subject: Re: [PATCH v7 8/8] KVM: VMX: enable IPI virtualization


On 4/5/2022 1:57 AM, Sean Christopherson wrote:
> On Sun, Apr 03, 2022, Zeng Guang wrote:
>> On 4/1/2022 10:37 AM, Sean Christopherson wrote:
>>>> @@ -4219,14 +4226,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>>>>    	pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
>>>>    	if (cpu_has_secondary_exec_ctrls()) {
>>>> -		if (kvm_vcpu_apicv_active(vcpu))
>>>> +		if (kvm_vcpu_apicv_active(vcpu)) {
>>>>    			secondary_exec_controls_setbit(vmx,
>>>>    				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>    				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>> -		else
>>>> +			if (enable_ipiv)
>>>> +				tertiary_exec_controls_setbit(vmx,
>>>> +						TERTIARY_EXEC_IPI_VIRT);
>>>> +		} else {
>>>>    			secondary_exec_controls_clearbit(vmx,
>>>>    					SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>    					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>> +			if (enable_ipiv)
>>>> +				tertiary_exec_controls_clearbit(vmx,
>>>> +						TERTIARY_EXEC_IPI_VIRT);
>>> Oof.  The existing code is kludgy.  We should never reach this point without
>>> enable_apicv=true, and enable_apicv should be forced off if APICv isn't supported,
>>> let alone seconary exec being support.
>>>
>>> Unless I'm missing something, throw a prep patch earlier in the series to drop
>>> the cpu_has_secondary_exec_ctrls() check, that will clean this code up a smidge.
>> cpu_has_secondary_exec_ctrls() check can avoid wrong vmcs write in case mistaken
>> invocation.
> KVM has far bigger problems on buggy invocation, and in that case the resulting
> printk + WARN from the failed VMWRITE is a good thing.


SDM doesn't define VMWRITE failure for such case. But it says the logical
processor operates as if all the secondary processor-based VM-execution
controls were 0 if "activate secondary controls" primary processor-based
VM-execution control is 0. So we may add WARN() to detect this kind of
buggy invocation instead.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 61e075e16c19..6c370b507b45 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4200,22 +4200,22 @@ static void vmx_refresh_apicv_exec_ctrl(struct 
kvm_vcpu *vcpu)
         struct vcpu_vmx *vmx = to_vmx(vcpu);

         pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
-       if (cpu_has_secondary_exec_ctrls()) {
-               if (kvm_vcpu_apicv_active(vcpu)) {
-                       secondary_exec_controls_setbit(vmx,
- SECONDARY_EXEC_APIC_REGISTER_VIRT |
- SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-                       if (enable_ipiv)
-                               tertiary_exec_controls_setbit(vmx,
- TERTIARY_EXEC_IPI_VIRT);
-               } else {
-                       secondary_exec_controls_clearbit(vmx,
- SECONDARY_EXEC_APIC_REGISTER_VIRT |
- SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-                       if (enable_ipiv)
- tertiary_exec_controls_clearbit(vmx,
- TERTIARY_EXEC_IPI_VIRT);
-               }
+
+       WARN(!cpu_has_secondary_exec_ctrls(),
+                    "VMX: unexpected vmwrite with inactive secondary 
exec controls");
+
+       if (kvm_vcpu_apicv_active(vcpu)) {
+               secondary_exec_controls_setbit(vmx,
+                             SECONDARY_EXEC_APIC_REGISTER_VIRT |
+ SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+               if (enable_ipiv)
+                       tertiary_exec_controls_setbit(vmx, 
TERTIARY_EXEC_IPI_VIRT);
+       } else {
+               secondary_exec_controls_clearbit(vmx,
+                             SECONDARY_EXEC_APIC_REGISTER_VIRT |
+ SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+               if (enable_ipiv)
+                       tertiary_exec_controls_clearbit(vmx, 
TERTIARY_EXEC_IPI_VIRT);
         }

         vmx_update_msr_bitmap_x2apic(vcpu);

>
>>>> +	 */
>>>> +	if (vmx_can_use_ipiv(vcpu->kvm)) {
>>>> +		struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
>>>> +
>>>> +		mutex_lock(&vcpu->kvm->lock);
>>>> +		err = vmx_alloc_pid_table(kvm_vmx);
>>>> +		mutex_unlock(&vcpu->kvm->lock);
>>> This belongs in vmx_vm_init(), doing it in vCPU creation is a remnant of the
>>> dynamic resize approach that's no longer needed.
>> We cannot allocate pid table in vmx_vm_init() as userspace has no chance to
>> set max_vcpu_ids at this stage. That's the reason we do it in vCPU creation
>> instead.
> Ah, right.  Hrm.  And that's going to be a recurring problem if we try to use the
> dynamic kvm->max_vcpu_ids to reduce other kernel allocations.
>
> Argh, and even kvm_arch_vcpu_precreate() isn't protected by kvm->lock.
>
> Taking kvm->lock isn't problematic per se, I just hate doing it so deep in a
> per-vCPU flow like this.
>
> A really gross hack/idea would be to make this 64-bit only and steal the upper
> 32 bits of @type in kvm_create_vm() for the max ID.
>
> I think my first choice would be to move kvm_arch_vcpu_precreate() under kvm->lock.
> None of the architectures that have a non-nop implemenation (s390, arm64 and x86)
> do significant work, so holding kvm->lock shouldn't harm performance.  s390 has to
> acquire kvm->lock in its implementation, so we could drop that.  And looking at
> arm64, I believe its logic should also be done under kvm->lock.
>
> It'll mean adding yet another kvm_x86_ops, but I like that more than burying the
> code deep in vCPU creation.
>
> Paolo, any thoughts on this?

Sounds reasonable. I will prepare patch to refactor the 
kvm_arch_vcpu_precreate()
and make pid table allocation done there.

Thanks.