lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 Apr 2023 18:43:46 +0200
From:   Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Tianyu Lan <ltykernel@...il.com>,
        Michael Kelley <mikelley@...rosoft.com>
Subject: Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

On 3/7/2023 6:36 PM, Sean Christopherson wrote:
> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway.  KVM
> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
> happens to get lucky and not run afoul of the underlying bugs.  The revert appears
> to be reasonably straightforward (see bottom).

Hi Sean,

I'm back, and I don't have good news. The fix for the missing hyperv TLB flushes has
landed in Linus' tree and I now had the chance to test things outside Azure, in WSL on my
AMD laptop.

There is some seriously weird interaction going on between TDP MMU and Hyper-V, with
or without enlightened TLB. My laptop has 16 vCPUs, so the WSL VM also has 16 vCPUs.
I have hardcoded the kernel to disable enlightened TLB (so we know that is not interfering).
I'm running a Flatcar Linux VM inside the WSL VM using legacy BIOS, a single CPU
and 4GB of RAM.

If I run with `kvm.tdp_mmu=0`, I can boot and shutdown my VM consistently in 20 seconds.

If I run with TDP MMU, the VM boot stalls for seconds at a time in various spots
(loading grub, decompressing kernel, during kernel boot), the boot output feels like
it's happening in slow motion. The fastest I see it finish the same cycle is 2 minutes,
I have also seen it take 4 minutes, sometimes even not finish at all. Same everything,
the only difference is the value of `kvm.tdp_mmu`.

So I would like to revisit disabling tdp_mmu on hyperv altogether for the time being but it
should probably be with the following condition:

  tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled && !hypervisor_is_type(X86_HYPER_MS_HYPERV)

Do you have an environment where you would be able to reproduce this? A Windows server perhaps
or an AMD laptop?

Jeremi

> 
> And _if_ we want to hack-a-fix it, then I would strongly prefer a very isolated,
> obviously hacky fix, e.g.
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 36e4561554ca..a9ba4ae14fda 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5779,8 +5779,13 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>         tdp_root_level = tdp_forced_root_level;
>         max_tdp_level = tdp_max_root_level;
>  
> +       /*
> +        * FIXME: Remove the enlightened TLB restriction when KVM properly
> +        * handles TLB flushes for said enlightenment.
> +        */.
>  #ifdef CONFIG_X86_64
> -       tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> +       tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled &&
> +                         !(ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB);
>  #endif
>         /*
>          * max_huge_page_level reflects KVM's MMU capabilities irrespective
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ