linux-kernel - Re: [RFC PATCH 06/18] KVM: VMX: Wire up Intel MBEC enable/disable logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7704c861ba54c246dc8e5f26113c6f84306a099e.camel@amd.com>
Date: Mon, 16 Jun 2025 09:27:46 +0000
From: "Shah, Amit" <Amit.Shah@....com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "x86@...nel.org" <x86@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "bp@...en8.de" <bp@...en8.de>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"jon@...anix.com" <jon@...anix.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 06/18] KVM: VMX: Wire up Intel MBEC enable/disable
 logic

On Wed, 2025-05-14 at 05:55 -0700, Sean Christopherson wrote:
> On Wed, May 14, 2025, Amit Shah wrote:
> > On Tue, 2025-05-13 at 06:28 -0700, Sean Christopherson wrote:
> > > On Tue, May 13, 2025, Jon Kohler wrote:
> > > > > On May 12, 2025, at 2:23 PM, Sean Christopherson
> > > > > This is wrong and unnecessary.  As mentioned early, the input
> > > > > that
> > > > > matters is vmcs12.  This flag should *never* be set for
> > > > > vmcs01.
> > > > 
> > > > I’ll page this back in, but I’m like 75% sure it didn’t work
> > > > when I
> > > > did it that way.
> > > 
> > > Then you had other bugs.  The control is per-VMCS and thus needs
> > > to
> > > be emulated
> > > as such.  Definitely holler if you get stuck, there's no need to
> > > develop this in
> > > complete isolation.
> > 
> > Looking at this from the AMD GMET POV, here's how I think support
> > for
> > this feature for a Windows guest would be implemented:
> > 
> > * Do not enable the GMET feature in vmcb01.  Only the Windows guest
> > (L1
> > guest) sets this bit for its own guest (L2 guest).  KVM (L0) should
> > see
> > the bit set in vmcb02 (and vmcb12).  OTOH, pass on the CPUID bit to
> > the
> > L1 guest.
> > 
> > * KVM needs to propagate the #NPF to Windows (instead of handling
> > anything itself -- ie no shadow page table adjustments or walks
> > needed).  Windows spawns an L2 guest that causes the #NPF, and
> > Windows
> > is the one that needs to consume that fault.
> > 
> > * KVM needs to differentiate an #NPF exit due to GMET or non-GMET
> > condition -- check the CPL and U/S bits from the exit, and the NX
> > bit
> > from the PTE that faulted.  If due to GMET, propagate it to the
> > guest.
> > If not, continue handling it
> 
> Yes, but no.  KVM shouldn't need to do anything special here other
> than teaching
> update_permission_bitmask() to understand the GMET fault case.  Ditto
> for MBEC.
> I'd type something up, but I would quickly encounter -ENOCOFFE :-)
> 
> With the correct mmu->permissions[], permission_fault() will
> naturally detect
> that a #NPF (or EPT Violation) from L2 due to a GMET/MBEC violation
> is a fault
> in the nNPT/nEPT domain and route the exit to L1.
>
> > (btw KVM MMU API question -- from the #NPF, I have the GPA of the
> > L2
> > guest.  How to go from that guest GPA to look up the NX bit for
> > that
> > page?  I skimmed and there doesn't seem to be an existing API for
> > it -
> > so is walking the tables the only solution?)
> 
> As above, KVM doesn't manually look up individual bits while handling
> faults.
> The walk of the guest page tables (L1's NPT/EPT for this scenario)
> performed by
> FNAME(walk_addr_generic) will gather the effective permissions in
> walker->pte_access,
> and check for a permission_fault() after the walk is completed.

Hm, despite the discussions in the PUCK calls since this email, I have
this doubt, which may be fairly basic.  To determine whether the exit
was due to GMET, we have to check the effective U/S and NX bit for the
address that faulted.  That means we have to walk the L2's page tables
to get those bits from the L2's PTEs, and then from the error code in
exitinfo1, confirm why the #NPF happened.  (And even with Paolo's neat
SMEP hack, the exit reason due to GMET can only be confirmed by looking
at the guest's U/S and NX bits.)

And from what I see, currently page table walks only happen on L1's
page tables, and not on L2's page tables, is that right?

I'm sure I'm missing something here, though..


		Amit