lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 03 Aug 2022 13:08:20 +0300
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Mingwei Zhang <mizhang@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Oliver Upton <oupton@...gle.com>
Subject: Re: [PATCH 1/5] KVM: x86: Get vmcs12 pages before checking pending
 interrupts

On Tue, 2022-08-02 at 23:07 +0000, Mingwei Zhang wrote:
> From: Oliver Upton <oupton@...gle.com>
> 
> vmx_guest_apic_has_interrupts implicitly depends on the virtual APIC
> page being present + mapped into the kernel address space. However, with
> demand paging we break this dependency, as the KVM_REQ_GET_VMCS12_PAGES
> event isn't assessed before entering vcpu_block.
> 
> Fix this by getting vmcs12 pages before inspecting the guest's APIC
> page. Note that upstream does not have this issue, as they will directly
> get the vmcs12 pages on vmlaunch/vmresume instead of relying on the
> event request mechanism. However, the upstream approach is problematic,
> as the vmcs12 pages will not be present if a live migration occurred
> before checking the virtual APIC page.

Since this patch is intended for upstream, I don't fully understand
the meaning of the above paragraph.


> 
> Signed-off-by: Oliver Upton <oupton@...gle.com>
> Signed-off-by: Mingwei Zhang <mizhang@...gle.com>
> ---
>  arch/x86/kvm/x86.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5366f884e9a7..1d3d8127aaea 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10599,6 +10599,23 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
>  {
>  	bool hv_timer;
>  
> +	/*
> +	 * We must first get the vmcs12 pages before checking for interrupts
> +	 * that might unblock the guest if L1 is using virtual-interrupt
> +	 * delivery.
> +	 */
> +	if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
> +		/*
> +		 * If we have to ask user-space to post-copy a page,
> +		 * then we have to keep trying to get all of the
> +		 * VMCS12 pages until we succeed.
> +		 */
> +		if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
> +			kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> +			return 0;
> +		}
> +	}
> +
>  	if (!kvm_arch_vcpu_runnable(vcpu)) {
>  		/*
>  		 * Switch to the software timer before halt-polling/blocking as


If I understand correctly, you are saying that if apic backing page is migrated in post copy
then 'get_nested_state_pages' will return false and thus fail?

AFAIK both SVM and VMX versions of 'get_nested_state_pages' assume that this is not the case
for many things like MSR bitmaps and such - they always uses non atomic versions
of guest memory access like 'kvm_vcpu_read_guest' and 'kvm_vcpu_map' which
supposed to block if they attempt to access HVA which is not present, and then
userfaultd should take over and wake them up.

If that still fails, nested VM entry is usually failed, and/or the whole VM
is crashed with 'KVM_EXIT_INTERNAL_ERROR'.

Anything I missed? 

Best regards,
	Maxim Levitsky

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ