lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251107201151.3303170-1-jmattson@google.com>
Date: Fri,  7 Nov 2025 12:11:23 -0800
From: Jim Mattson <jmattson@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Alexander Graf <agraf@...e.de>, Joerg Roedel <joro@...tes.org>, 
	Avi Kivity <avi@...hat.com>, 
	"Radim Krčmář" <rkrcmar@...hat.com>, David Hildenbrand <david@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Cc: Jim Mattson <jmattson@...gle.com>
Subject: [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT

There are several problems with KVM's virtualization of the G_PAT
field when nested paging is enabled in VMCB12.

* The VMCB12 G_PAT field is not checked for validity when emulating
  VMRUN.  (APM volume 2, section 15.25.4: Nested Paging and
  VMRUN/#VMEXIT)

* RDMSR(PAT) and WRMSR(PAT) from L2 access L1's PAT MSR rather than
  L2's Guest PAT register. (APM volume 2, section 15.25.2: Replicated
  State)

* The L2 Guest PAT register is not written back to VMCB12 on #VMEXIT
  from L2 to L1. (APM volume 3, Section 4: "VMRUN")

* The value of L2's Guest PAT register is not serialized for
  save/restore when a checkpoint is taken while L2 is active.

Commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2
guest") left this comment in nested_vmcb02_compute_g_pat():

      /* FIXME: merge g_pat from vmcb01 and vmcb12.  */

This comment makes no sense. It is true that there are now three
different PATs to consider: L2's PAT for guest page tables, L1's PAT
for the nested page tables mapping L2 guest physical addresses to L1
guest physical addresses, and L0's PAT for the nested page tables
mapping L1 guest physical addresses to host physical
addresses. However, if there is any "merging" to be done, it would
involve the latter two, and would happen during shadow nested page
table construction. (For the record, I don't think "merging" the two
nested page table PATs is feasible.) In any case, the VMCB12 G_PAT
should be copied unmodified into VMCB02.

Maybe the rest of the current implementation is a consistent quirk
based on the existing nested_vmcb02_compute_g_pat() code that bypasses
L1's request in VMCB12 and copies L1's PAT MSR into vmcb02
instead. However, an L1 hypervisor that does not intercept accesses to
the PAT MSR would legitimately be surprised to find that its L2 guest
can modify the hypervisor's own PAT!

The commits in this series are in an awkward order, because I didn't
want to change nested_vmcb02_compute_g_pat() until I had removed the
call site from svm_set_msr().

The first two commits should arguably be one, but I tried to deal with
the serialization issue separately from the RDMSR/WRMSR issue, despite
the two being intertwined.

I don't like the ugliness of KVM_GET_MSRS saving the L2 Guest PAT
register during a checkpoint, but KVM_SET_MSRS restoring the
architectural PAT MSR on restore (because when KVM_SET_MSRS is called,
L2 is not active). The APM section on replicated state offers a
possible out:

  While nested paging is enabled, all (guest) references to the state
  of the paging registers by x86 code (MOV to/from CRn, etc.) read and
  write the guest copy of the registers

If we consider KVM_{GET,SET}_MSRS not to be "guest" references, we
could always access the architected PAT MSR from userspace, and we
could grab 64 bits from the SVM nested state header to serialize L2's
G_PAT. In some ways, that seems cleaner, but it does mean that
KVM_{GET,SET}_MSR will access L1's PAT, which is irrelevant while L2
is active.

Hence, I am posting this series as an RFC.

Jim Mattson (6):
  KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in
    svm_set_nested_state()
  KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled
    in vmcb12
  KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT
  KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached
  KVM: x86: nSVM: Add validity check for the VMCB12 g_pat
  KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT

 arch/x86/include/uapi/asm/kvm.h |  2 ++
 arch/x86/kvm/svm/nested.c       | 35 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c          | 25 +++++++++++++++--------
 arch/x86/kvm/svm/svm.h          |  1 +
 4 files changed, 53 insertions(+), 10 deletions(-)

-- 
2.51.2.1041.gc1ab5b90ca-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ