lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250326193619.3714986-1-yosry.ahmed@linux.dev>
Date: Wed, 26 Mar 2025 19:35:55 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
	Jim Mattson <jmattson@...gle.com>,
	Maxim Levitsky <mlevitsk@...hat.com>,
	Vitaly Kuznetsov <vkuznets@...hat.com>,
	Rik van Riel <riel@...riel.com>,
	Tom Lendacky <thomas.lendacky@....com>,
	x86@...nel.org,
	kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Yosry Ahmed <yosry.ahmed@...ux.dev>
Subject: [RFC PATCH 00/24] KVM: SVM: Rework ASID management

This series reworks how SVM manages ASIDs by:
(a) Allocating a single static ASID for each L1 VM, instead of
    dynamically allocating ASIDs. This simplifies the logic and allow
    for more unifications between SVM and SEV, as the latter already
    uses per-VM ASIDs as required for other purposes.

    This is patches 1 to 10.

(b) Using a separate ASID for L2 VMs. Instead of using the same ASID for
    L1 and L2 guests, and doing a TLB flush and MMU sync on every nested
    transition, a separate ASID is used and TLB flushes are done
    conditionally as needed.

    This is patches 11 till the end.

The advantages of this are:
- Simplifying the logic by dropping dynamic ASID allocations.
- Unifying some logic between SVM and SEV, as the latter already uses
  per-VM ASIDs as required for other purposes.
- Enabling INVLPGB virtualization [1].
- Improving the performance of nested guests by avoiding some TLB
  flushes.

The series was tested by running a L2 and L3 Linux guests with some
simple workloads in them (mmap()/munmap() stress, netperf, etc). I also
ran the KVM selftests in both L0 and L1.

I believe some of the patches are in mergeable state, but this series is
still an RFC for a few reasons:
- I haven't done as much testing as I initially planned. Mainly I wanted
  to test with a Windows guest running WSL to get Linux and Windows L2
  VMs running side-by-side. I couldn't get it done due to some
  testing infrastructure hiccups.

- The SEV changes are generally untested beyond build testing, and I
  would like to get more feedback on them before moving forward. Namely,
  I think there is room for further unification. SEV should probably use
  the new kvm_tlb_tags infrastructure to allocate its ASIDs as well. The
  way I think about it is by optionally having a bitmap of "pending"
  ASIDs in kvm_tlb_tags, and make unused SEV ASIDs "pending" until we
  run out of space and do the necessary flushes to make them free.

- I want to get general feedback about the direction this is heading in,
  and things like generalizing the ASID tracking in SEV to work for SVM,
  thoughts on using an xarray for that, etc.

- Some things can/should be cleaned up, although they can be followups
  too. For example, the current logic will allocate a "normal" ASID for
  an SEV VM upon creation, then allocate an SEV-friendly ASID to it when
  SEV is initialized. The "normal" ASID remains allocated though, and
  kvm_svm->asid and kvm_svm->sev_info.asid remain different. It seems
  like we should not allocate the "normal" ASID to begin with, or free
  it if the VM uses SEV. However, I am not sure what's the best way to
  do any of this because I am not clear on the life cycle of a SEV VM.

This series started as two separate series, one to optimize nested TLB
flushes by using a separate ASID for L2 VMs [2], and one to use a single
ASID per-VM [3]. However, there is a lot of dependency and interaction
among both series that I think it's useful to combine them, at least for
now so that the big picture is clear. The series can be later split
again into 2 or more series, or merged incrementally.

I am sending this out now to get feedback, and also to "checkpoint" my
work as I won't be picking this up again for a few months. I will remain
able to respond to discussion and reviews, although at a lower capacity.
If anyone wants to pick up this series in the meantime, partially or
fully, please feel free to do so. Just let me know so that we can
coordinate.

Rik and Tom, I CC'd you due to the previous discussion you had with Sean
about INVLPGB virtualization. I can drop you from following versions if
you'd like to avoid the noise.

Here is a brief walkthrough of the series:

Part 1: Use a single ASID per-VM
- Patch 1 generalizes the VPID allocation into a generic kvm_tlb_tags
  factory to be used by SVM.
- Patches 2-3 are cleanups and/or refactoring.
- Patches 4-5 get rid of the cases where we currently allocate a new
  ASID dynamically by just flushing the existing ASID or falling back to
  full flush if flushing an ASID is not supported.
- Patches 6-9 generalize SEV's per-CPU ASID -> vCPU tracking to make it
  work for SVM.
- Patch 10 finally drops the dynamic ASID allocation logic and uses a
  single per-VM ASID.

Part 2: Optimize nSVM TLB flushes
- Patch 11 starts by using a separate ASID for L2 guests, although
  it is initially the same as the L1 ASID. It's essentially just laying
  the groundwork.
- Patches 12 - 16 are refactoring groundwork.
- Patches 17 - 22 add the needed handling of the L2 ASID TLB flushing.
- Patch 23 starts allocating a new ASID for L2 as using the same ASID is
  no longer needed.
- Patch 24 drops the unconditional TLB flushes on nested transitions,
  which are no longer necessary after L2 is using a separate
  well-maintained ASID.

Diff from the initial versions of series [2] and [3]:
- Generalized the SEV tracking of ASID->vCPU to use it for SVM, to make
  sure the TLB is flushed when a new vCPU with the same ASID is run on
  the same physical CPU.
- Made sure kvm_hv_vcpu_purge_flush_tlb() is handled correctly by
  passing in is_guest_mode to purge the correct queue when doing L1 vs
  L2 TLB flushes (Maxim).
- Improved the commentary in nested_svm_entry_tlb_flush() (Maxim).
- Handle INVLPGA from the guest even nested NPT is used (Maxim).
- Improved some commit logs.

[1]https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com/
[2]https://lore.kernel.org/lkml/20250205182402.2147495-1-yosry.ahmed@linux.dev/
[3]https://lore.kernel.org/lkml/20250313215540.4171762-1-yosry.ahmed@linux.dev/


Yosry Ahmed (24):
  KVM: VMX: Generalize VPID allocation to be vendor-neutral
  KVM: SVM: Use cached local variable in init_vmcb()
  KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  KVM: SVM: Flush everything if FLUSHBYASID is not available
  KVM: SVM: Flush the ASID when running on a new CPU
  KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  KVM: SEV: Track ASID->vCPU on vCPU load
  KVM: SEV: Drop pre_sev_run()
  KVM: SEV: Generalize tracking ASID->vCPU with xarrays
  KVM: SVM: Use a single ASID per VM
  KVM: nSVM: Use a separate ASID for nested guests
  KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb()
  KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode
  KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns
  KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  KVM: x86/mmu: Allow skipping the gva flush in
    kvm_mmu_invalidate_addr()
  KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL
  KVM: nSVM: Flush the TLB if L1 changes L2's ASID
  KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
  KVM: nSVM: Service local TLB flushes before nested transitions
  KVM: nSVM: Handle INVLPGA interception correctly
  KVM: nSVM: Allocate a new ASID for nested guests
  KVM: nSVM: Stop bombing the TLB on nested transitions

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/include/asm/svm.h      |   5 -
 arch/x86/kvm/hyperv.h           |   8 +-
 arch/x86/kvm/mmu/mmu.c          |  22 ++-
 arch/x86/kvm/svm/nested.c       |  68 ++++++---
 arch/x86/kvm/svm/sev.c          |  60 +-------
 arch/x86/kvm/svm/svm.c          | 257 +++++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.h          |  43 ++++--
 arch/x86/kvm/vmx/nested.c       |   4 +-
 arch/x86/kvm/vmx/vmx.c          |  38 +----
 arch/x86/kvm/vmx/vmx.h          |   4 +-
 arch/x86/kvm/x86.c              |  60 +++++++-
 arch/x86/kvm/x86.h              |  13 ++
 13 files changed, 378 insertions(+), 206 deletions(-)

-- 
2.49.0.395.g12beb8f557-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ