lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <FA141DE5-1EAD-4362-BE90-2E24D51749FA@nutanix.com>
Date: Fri, 31 Oct 2025 17:58:29 +0000
From: Jon Kohler <jon@...anix.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>,
        "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling



> On Oct 30, 2025, at 6:42 PM, Sean Christopherson <seanjc@...gle.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> This series is the result of the recent PUCK discussion[*] on optimizing the
> XCR0/XSS loads that are currently done on every VM-Enter and VM-Exit.  My
> initial thought that swapping XCR0/XSS outside of the fastpath was spot on;
> turns out the only reason they're swapped in the fastpath is because of a
> hack-a-fix that papered over an egregious #MC handling bug where the kernel #MC
> handler would call schedule() from an atomic context.  The resulting #GP due to
> trying to swap FPU state with a guest XCR0/XSS was "fixed" by loading the host
> values before handling #MCs from the guest.
> 
> Thankfully, the #MC mess has long since been cleaned up, so it's once again
> safe to swap XCR0/XSS outside of the fastpath (but when IRQs are disabled!).

Thank you for doing the diligence on this, I appreciate it! 

> As for what may be contributing to the SAP HANA performance improvements when
> enabling PKU, my instincts again appear to be spot on.  As predicted, the
> fastpath savings are ~300 cycles on Intel (~500 on AMD).  I.e. if the guest
> is literally doing _nothing_ but generating fastpath exits, it will see a
> ~%25 improvement.  There's basically zero chance the uplift seen with enabling
> PKU is dues to eliding XCR0 loads; my guess is that the guest actualy uses
> protection keys to optimize something.

Every little bit counts, thats a healthy percentage speedup for fast path stuff,
especially on AMD.

> Why does kvm_load_guest_xsave_state() show up in perf?  Probably because it's
> the only visible symbol other than vmx_vmexit() (and vmx_vcpu_run() when not
> hammering the fastpath).  E.g. running perf top on a running VM instance yields
> these numbers with various guest workloads (the middle one is running
> mmu_stress_test in the guest, which hammers on mmu_lock in L0).  But other than
> doing INVD (handled in the fastpath) in a tight loop, there's no perceived perf
> improvement from the guest.

nit: it’d be nice if these bits were labeled with what they were from (the middle one
you called out above, but what’s the first and third one)

> Overhead  Shared Object       Symbol
>  15.65%  [kernel]            [k] vmx_vmexit
>   6.78%  [kernel]            [k] kvm_vcpu_halt
>   5.15%  [kernel]            [k] __srcu_read_lock
>   4.73%  [kernel]            [k] kvm_load_guest_xsave_state
>   4.69%  [kernel]            [k] __srcu_read_unlock
>   4.65%  [kernel]            [k] read_tsc
>   4.44%  [kernel]            [k] vmx_sync_pir_to_irr
>   4.03%  [kernel]            [k] kvm_apic_has_interrupt
> 
> 
>  45.52%  [kernel]            [k] queued_spin_lock_slowpath
>  24.40%  [kernel]            [k] vmx_vmexit
>   2.84%  [kernel]            [k] queued_write_lock_slowpath
>   1.92%  [kernel]            [k] vmx_vcpu_run
>   1.40%  [kernel]            [k] vcpu_run
>   1.00%  [kernel]            [k] kvm_load_guest_xsave_state
>   0.84%  [kernel]            [k] kvm_load_host_xsave_state
>   0.72%  [kernel]            [k] mmu_try_to_unsync_pages
>   0.68%  [kernel]            [k] __srcu_read_lock
>   0.65%  [kernel]            [k] try_get_folio
> 
>  17.78%  [kernel]            [k] vmx_vmexit
>   5.08%  [kernel]            [k] vmx_vcpu_run
>   4.24%  [kernel]            [k] vcpu_run
>   4.21%  [kernel]            [k] _raw_spin_lock_irqsave
>   2.99%  [kernel]            [k] kvm_load_guest_xsave_state
>   2.51%  [kernel]            [k] rcu_note_context_switch
>   2.47%  [kernel]            [k] ktime_get_update_offsets_now
>   2.21%  [kernel]            [k] kvm_load_host_xsave_state
>   2.16%  [kernel]            [k] fput
> 
> [*] https://drive.google.com/drive/folders/1DCdvqFGudQc7pxXjM7f35vXogTf9uhD4
> 
> Sean Christopherson (4):
>  KVM: SVM: Handle #MCs in guest outside of fastpath
>  KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
>  KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run
>    loop
>  KVM: x86: Load guest/host PKRU outside of the fastpath run loop
> 
> arch/x86/kvm/svm/svm.c  | 20 ++++++++--------
> arch/x86/kvm/vmx/main.c | 13 ++++++++++-
> arch/x86/kvm/vmx/tdx.c  |  3 ---
> arch/x86/kvm/vmx/vmx.c  |  7 ------
> arch/x86/kvm/x86.c      | 51 ++++++++++++++++++++++++++++-------------
> arch/x86/kvm/x86.h      |  2 --
> 6 files changed, 56 insertions(+), 40 deletions(-)
> 
> 
> base-commit: 4cc167c50eb19d44ac7e204938724e685e3d8057
> -- 
> 2.51.1.930.gacf6e81ea2-goog
> 

Had one conversation starter comment on patch 4, but otherwise, LGTM for
the entire series, thanks again for the help!

Reviewed-By: Jon Kohler <jon@...anix.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ