lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <zfhwufkxrv4uqibspjstsqruuz5mgd4t765c3cobh374bmfqwy@welriubpwp6t>
Date: Mon, 13 Oct 2025 23:13:39 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 08/12] KVM: selftests: Use 'leaf' instead of hugepage to
 describe EPT entries

On Mon, Oct 13, 2025 at 03:58:30PM -0700, Sean Christopherson wrote:
> On Mon, Oct 13, 2025, Yosry Ahmed wrote:
> > On Mon, Oct 13, 2025 at 02:41:56PM -0700, Sean Christopherson wrote:
> > > On Wed, Oct 01, 2025, Yosry Ahmed wrote:
> > > > From: Yosry Ahmed <yosryahmed@...gle.com>
> > > > 
> > > > The assertions use 'hugepage' to describe a terminal EPT entry, but
> > > > 'leaf' is more accruate as a PG_LEVEL_4K EPT entry is a leaf but not a
> > > > hugepage.
> > > 
> > > Yes, it's more accurate, but also less precise.  I'm guessing the assert message
> > > and comment talked about hugepages because that's the type of mappings that
> > > caused problems at the time.
> > 
> > Given that it refers to PG_LEVEL_4K entries too, I wouldn't call it less
> > precise. All callers actually create 4K mappings so it is never actually
> > a hugepage in the current context :D
> 
> nested_identity_map_1g()?

Yeah I missed this one.

> 
> > > Ah, actually, I bet the code was copy+pasted from virt_create_upper_pte(), in
> > > which case the assumptions about wanting to create a hupage are both accurate
> > > and precise.
> > > 
> > > > The distincion will be useful in coming changes that will pass
> > > > the value around and 'leaf' is clearer than hugepage or page_size.
> > > 
> > > What value?
> > 
> > 'leaf'. The following changes will pass 'leaf' in as a boolean instead
> > of checking 'current_level == target_level' here. So passing in
> > 'hugepage' would be inaccurate, and 'page_size' is not as clear (but
> > still works).
> > 
> > > 
> > > > Leave the EPT bit named page_size to keep it conforming to the manual.
> > > > 
> > > > Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> > > > ---
> > > >  tools/testing/selftests/kvm/lib/x86/vmx.c | 10 +++++-----
> > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
> > > > index 04c4b97bcd1e7..673756b27e903 100644
> > > > --- a/tools/testing/selftests/kvm/lib/x86/vmx.c
> > > > +++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
> > > > @@ -380,15 +380,15 @@ static void nested_create_pte(struct kvm_vm *vm,
> > > >  			pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
> > > >  	} else {
> > > >  		/*
> > > > -		 * Entry already present.  Assert that the caller doesn't want
> > > > -		 * a hugepage at this level, and that there isn't a hugepage at
> > > > -		 * this level.
> > > > +		 * Entry already present.  Assert that the caller doesn't want a
> > > > +		 * leaf entry at this level, and that there isn't a leaf entry
> > > > +		 * at this level.
> > > >  		 */
> > > >  		TEST_ASSERT(current_level != target_level,
> > > > -			    "Cannot create hugepage at level: %u, nested_paddr: 0x%lx",
> > > > +			    "Cannot create leaf entry at level: %u, nested_paddr: 0x%lx",
> > > >  			    current_level, nested_paddr);
> > > >  		TEST_ASSERT(!pte->page_size,
> > > > -			    "Cannot create page table at level: %u, nested_paddr: 0x%lx",
> > > > +			    "Leaf entry already exists at level: %u, nested_paddr: 0x%lx",
> > > 
> > > This change is flat out wrong.  The existing PRESENT PTE _might_ be a 4KiB leaf
> > > entry, but it might also be an existing non-leaf page table.
> > 
> > Hmm if pte->page_size is true then it has to be a leaf page table,
> > right?
> 
> No, because bit 7 is ignored by hardware for 4KiB entries.  I.e. it can be 0 or
> 1 depending on the whims of software.  Ugh, this code uses bit 7 to flag leaf
> entries.  That's lovely.

That's not my fault :P

> 
> > If it's an existing non-leaf page table we shouldn't fail,
> 
> Ah, right, current_level can never be less than target_level because the first
> assert will fail on iteration-1.
> 
> > the assertion here is when we try to override a leaf page table IIUC.
> >
> > > Instead of hacking on the nested code, can we instead tweak __virt_pg_map() to
> > > work with nested TDP?  At a glance, it's already quite close, e.g. "just" needs
> > > to be taught about EPT RWX bits and allow the call to pass in the root pointer.
> > 
> > That would be ideal, I'll take a look. In case I don't have time for
> > that unification, can this be a follow-up change?
> 
> Part of me wants to be nice and say "yes", but most of me wants to say "no".

So.. which part won?

> 
> Struct overlays for PTEs suck.  At best, they generate poor code and obfuscate
> simple logic (e.g. vm->page_size vs pte->page_size is a confusion that simply
> should not be possible).  At worst, they lead to hard-to-debug issues like the
> one that led to commit f18b4aebe107 ("kvm: selftests: do not use bitfields larger
> than 32-bits for PTEs").
> 
> eptPageTableEntry obviously isn't your fault, but nptPageTableEntry is. :-D
> And I suspect the hardest part of unificiation will be adding the globals to
> deal with variable bit positions that are currently being handled by the struct
> overlays.

I have no problem getting rid of eptPageTableEntry and using bitmasks
and whatnot on a uint64_t PTE (assuming that's what you are asking for
here).

But I think tweaking __virt_pg_map() will involve more than that, or
maybe I just didn't look close enough yet.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ