linux-kernel - Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aYUz8Ur91l7MyCK7@google.com>
Date: Thu, 5 Feb 2026 16:21:05 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Kevin Cheng <chengkev@...gle.com>
Cc: Yosry Ahmed <yosry.ahmed@...ux.dev>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK

On Wed, Feb 04, 2026, Kevin Cheng wrote:
> On Wed, Jan 28, 2026 at 10:48 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Thu, Jan 22, 2026, Yosry Ahmed wrote:
> > > On Wed, Jan 21, 2026 at 02:07:56PM -0800, Sean Christopherson wrote:
> > > > On Wed, Jan 21, 2026, Kevin Cheng wrote:
> > > > > When KVM emulates an instruction for L2 and encounters a nested page
> > > > > fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> > > > > injects an NPF to L1. However, the code incorrectly hardcodes
> > > > > (1ULL << 32) for exit_info_1's upper bits when the original exit was
> > > > > not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> > > > > occurred on a page table page, preventing L1 from correctly identifying
> > > > > the cause of the fault.
> > > > >
> > > > > Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> > > > > occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> > > > > the fault occurs on the final GPA-to-HPA translation.
> > > > >
> > > > > Widen error_code in struct x86_exception from u16 to u64 to accommodate
> > > > > the PFERR_GUEST_* bits (bits 32 and 33).
> > > >
> > > > Please do this in a separate patch.  Intel CPUs straight up don't support 32-bit
> > > > error codes, let alone 64-bit error codes, so this seemingly innocuous change
> > > > needs to be accompanied by a lengthy changelog that effectively audits all usage
> > > > to "prove" this change is ok.
> > >
> > > Semi-jokingly, we can add error_code_hi to track the high bits and
> > > side-step the problem for Intel (dejavu?).
> >
> > Technically, it would require three fields: u16 error_code, u16 error_code_hi,
> > and u32 error_code_ultra_hi.  :-D
> >
> > Isolating the (ultra) hi flags is very tempting, but I worry that it would lead
> > to long term pain, e.g. because inevitably we'll forget to grab the hi flags at
> > some point.  I'd rather audit the current code and ensure that KVM truncates the
> > error code as needed.
> >
> > VMX is probably a-ok, e.g. see commit eba9799b5a6e ("KVM: VMX: Drop bits 31:16
> > when shoving exception error code into VMCS").  I'd be more worred SVM, where
> > it's legal to shove a 32-bit value into the error code, i.e. where KVM might not
> > have existing explicit truncation.
> 
> As I understand it, intel CPUs don't allow for setting bits 31:16 of
> the error code, but AMD CPUs allow bits 31:16 to be set.

Yep.

> The 86_exception error_code field is u16 currently so it is always truncated
> to u16 by default. In that case, after widening the error code to 64 bits, do
> I have to ensure that any usage of the error that isn't for NPF, has to
> truncate it to 16 bits?
>
> Or do I just need to verify that all SVM usages of the error_code for
> exceptions truncate the 64 bits down to 32 bits and all VMX usages truncate
> to 16 bits?

Hmm, good question.

I was going to say "the second one", but that's actually meaningless because
(a) "struct kvm_queued_exception" stores the error code as a u32, which it should,
and (b) event_inj_err is also a u32, i.e. it's impossible to shove a 64-bit error
code into hardware on SVM.

And thinking through this more, if there is _existing_ code that tries to set
bits > 15 in the error_code, then UBSAN would likely have detected the issue,
e.g. due to trying to OR in a "bad" value.

Aha!  A serendipitous quirk in this patch is that it does NOT change the local
error_code in FNAME(walk_addr_generic) from a u16 to a u64.

We should double down on that with a comment.  That'd give me enough confidence
that we aren't likely to break legacy shadow paging now or in the future.  E.g.
in a patch to change x86_exception.error_code to a u64, also do:

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 901cd2bd40b8..f1790aa9e391 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -317,6 +317,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
        const int write_fault = access & PFERR_WRITE_MASK;
        const int user_fault  = access & PFERR_USER_MASK;
        const int fetch_fault = access & PFERR_FETCH_MASK;
+       /*
+        * Note!  Track the error_code that's common to legacy shadow paging
+        * and NPT shadow paging as a u16 to guard against unintentionally
+        * setting any of bits 63:16.  Architecturally, the #PF error code is
+        * 32 bits, and Intel CPUs don't support settings bits 31:16.
+        */
        u16 errcode = 0;
        gpa_t real_gpa;
        gfn_t gfn;

> Just wanted to clarify because I think the wording of that statement
> is confusing me into thinking that maybe there is something wrong with
> 32 bit error codes for SVM?

It's more that I am confident that either KVM already truncates the error code
on VMX, or that we'll notice *really* quickly, because failure to truncate an
error code will generate a VM-Fail.

On SVM, we could royally screw up an error code and it's entirely possible we
wouldn't notice until some random guest breaks in some weird way.

> If the only usage of the widened field is NPF, wouldn't it be better
> to go with an additional field like Yosry suggested (I see that VMX
> has the added exit_qualification field in the struct)?

No?  paging_tmpl.h is used to shadow all flavors of nested NPT as well as all
flavors of legacy paging.  And more importantly, unlike EPT, nested NPT isn't
otherwise special cased.  As shown by this patch, it _is_ possible to identify
nested NPT in select flows, and we could certainly do so more generically by
checking if the MMU is nested, but I'd prefer not to special case any particular
type of shadow paging without a strong reason to do so.

And more importantly, if we have two (or three) error code fields, then we need
to remember to pull data from all error code fields.  I.e. in an effort to avoid
introducing bugs, we would actually make it easier to introduce _other_ bugs.