linux-kernel - Re: [PATCH v2 0/5] KVM: x86: hyper-v: Conditionally allow SynIC with APICv/AVIC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874keo7ew5.fsf@vitty.brq.redhat.com>
Date:   Thu, 27 May 2021 10:35:38 +0200
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Kechen Lu <kechenl@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/5] KVM: x86: hyper-v: Conditionally allow SynIC
 with APICv/AVIC

Maxim Levitsky <mlevitsk@...hat.com> writes:

> On Tue, 2021-05-18 at 16:43 +0200, Vitaly Kuznetsov wrote:
>> Changes since v1 (Sean):
>> - Use common 'enable_apicv' variable for both APICv and AVIC instead of 
>>  adding a new hook to 'struct kvm_x86_ops'.
>> - Drop unneded CONFIG_X86_LOCAL_APIC checks from VMX/SVM code along the
>>  way.
>> 
>> Original description:
>> 
>> APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
>> SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
>> however, possible to track whether the feature was actually used by the
>> guest and only inhibit APICv/AVIC when needed.
>> 
>> The feature can be tested with QEMU's 'hv-passthrough' debug mode.
>> 
>> Note, 'avic' kvm-amd module parameter is '0' by default and thus needs to
>> be explicitly enabled.
>> 
>> Vitaly Kuznetsov (5):
>>   KVM: SVM: Drop unneeded CONFIG_X86_LOCAL_APIC check for AVIC
>>   KVM: VMX: Drop unneeded CONFIG_X86_LOCAL_APIC check from
>>     cpu_has_vmx_posted_intr()
>>   KVM: x86: Use common 'enable_apicv' variable for both APICv and AVIC
>>   KVM: x86: Invert APICv/AVIC enablement check
>>   KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in
>>     use
>> 
>>  arch/x86/include/asm/kvm_host.h |  5 ++++-
>>  arch/x86/kvm/hyperv.c           | 27 +++++++++++++++++++++------
>>  arch/x86/kvm/svm/avic.c         | 16 +++++-----------
>>  arch/x86/kvm/svm/svm.c          | 24 +++++++++++++-----------
>>  arch/x86/kvm/svm/svm.h          |  2 --
>>  arch/x86/kvm/vmx/capabilities.h |  4 +---
>>  arch/x86/kvm/vmx/vmx.c          |  2 --
>>  arch/x86/kvm/x86.c              |  9 ++++++---
>>  8 files changed, 50 insertions(+), 39 deletions(-)
>> 
>
> I tested this patch set and this is what I found.
>
> For reference,
> First of all, indeed to make AVIC work I need to:
>  
> 1. Disable SVM - I wonder if I can make this on demand
> too when the guest actually uses a nested guest or at least
> enables nesting in IA32_FEATURE_CONTROL.
> I naturally run most of my VMs with nesting enabled,
> thus I tend to not have avic enabled due to this.
> I'll prepare a patch soon for this.
>  
> 2. Disable x2apic, naturally x2apic can't be used with avic.
> In theory we can also disable avic when the guest switches on
> the x2apic mode, but in practice the guest will likely to pick the x2apic
> when it can.
>  
> 3. (for hyperv) Disable 'hv_vapic', because otherwise hyper-v
> uses its own PV APIC msrs which AVIC doesn't support.
>
> This HV enlightment turns on in the CPUID both the 
> HV_APIC_ACCESS_AVAILABLE which isn't that bad 
> (it only tells that we have the VP assist page),
> and HV_APIC_ACCESS_RECOMMENDED which hints the guest
> to use HyperV PV APIC MSRS and use PV EOI field in 
> the APIC access page, which means that the guest 
> won't use the real apic at all.
>
> 4. and of course enable SynIC autoeoi deprecation.
>
> Otherwise indeed windows enables autoeoi.
>
> hv-passthrough indeed can't be used to test this
> as it both enables autoeoi depreciation and *hv-vapic*. 
> I had to use the patch that you posted
> in 'About the performance of hyper-v' thread.
>  
> In addition to that when I don't use the autoeoi depreciation patch,
> then the guest indeed enables autoeoi, and this triggers a deadlock.
>  

Hm, why don't I see in my testing? I'm pretty sure I'm testing both
cases...

> The reason is that kvm_request_apicv_update must not be called with
> srcu lock held vcpu->kvm->srcu (there is a warning about that
> in kvm_request_apicv_update), but guest msr writes which come
> from vcpu thread do hold it.
>  
> The other place where we disable AVIC on demand is svm_toggle_avic_for_irq_window.
> And that code has a hack to drop this lock and take 
> it back around the call to kvm_request_apicv_update.
> This hack is safe as this code is called only from the vcpu thread.
>  
> Also for reference the reason for the fact that we need to
> disable AVIC on the interrupt window request, or more correctly
> why we still need to request interrupt windows with AVIC,
> is that the local apic can act sadly as a pass-through device 
> for legacy PIC, when one of its LINTn pins is configured in ExtINT mode.
> In this mode when such pin is raised, the local apic asks the PIC for
> the interrupt vector and then delivers it to the APIC
> without touching the IRR/ISR.
>
> The later means that if guest's interrupts are disabled,
> such interrupt can't be queued via IRR to VAPIC
> but instead the regular interrupt window has to be requested, 
> but on AMD, the only way to request interrupt window
> is to queue a VIRQ, and intercept its delivery,
> a feature that is disabled when AVIC is active.
>  
> Finally for SynIC this srcu lock drop hack can be extended to this gross hack:
> It seems to work though:
>
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index bedd9b6cc26a..925b76e7b45e 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -85,7 +85,7 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
>  }
>  
>  static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
> -				int vector)
> +				int vector, bool host)
>  {
>  	struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
>  	struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
> @@ -109,6 +109,9 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
>  
>  	auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
>  
> +	if (!host)
> +		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> +
>  	/* Hyper-V SynIC auto EOI SINTs are not compatible with APICV */
>  	if (!auto_eoi_old && auto_eoi_new) {
>  		printk("Synic: inhibiting avic %d %d\n", auto_eoi_old, auto_eoi_new);
> @@ -121,6 +124,10 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
>  			kvm_request_apicv_update(vcpu->kvm, true,
>  						 APICV_INHIBIT_REASON_HYPERV);
>  	}
> +
> +	if (!host)
> +		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> +
>  }
>  
>  static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
> @@ -149,9 +156,9 @@ static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
>  
>  	atomic64_set(&synic->sint[sint], data);
>  
> -	synic_update_vector(synic, old_vector);
> +	synic_update_vector(synic, old_vector, host);
>  
> -	synic_update_vector(synic, vector);
> +	synic_update_vector(synic, vector, host);
>  
>  	/* Load SynIC vectors into EOI exit bitmap */
>  	kvm_make_request(KVM_REQ_SCAN_IOAPIC, hv_synic_to_vcpu(synic));
>
>
> Assuming that we don't want this gross hack,  

Is it dangerous or just ugly?

> I wonder if we can avoid full blown memslot 
> update when we disable avic, but rather have some 
> smaller hack like only manually patching its
> NPT mapping to have RW permissions instead 
> of reserved bits which we use for MMIO. 
>
> The AVIC spec says that NPT is only used to check that
> guest has RW permission to the page, 
> while the HVA in the NPT entry itself is ignored.

Assuming kvm_request_apicv_update() is called very rarely, I'd rather
kicked all vCPUs out (similar to KVM_REQ_MCLOCK_INPROGRESS) and
schedule_work() to make memslot update happen ourside of sRCU lock.

>
> Best regards,
> 	Maxim Levitsky
>
>
>
>
>

-- 
Vitaly