lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hyrk2lejyxrwcl5oudwkuqch7jwzgo3eoummopordtz5ryhklu@vrq5ratttux7>
Date: Fri, 21 Nov 2025 00:18:07 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jim Mattson <jmattson@...gle.com>, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 12/23] KVM: selftests: Parameterize the PTE bitmasks
 for virt mapping functions

On Thu, Nov 20, 2025 at 04:07:27PM -0800, Sean Christopherson wrote:
> On Tue, Oct 21, 2025, Yosry Ahmed wrote:
> > @@ -1367,8 +1357,6 @@ static inline bool kvm_is_ignore_msrs(void)
> >  	return get_kvm_param_bool("ignore_msrs");
> >  }
> >  
> > -uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> > -				    int *level);
> >  uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
> >  
> >  uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
> > @@ -1451,7 +1439,20 @@ enum pg_level {
> >  #define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
> >  #define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
> >  
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
> > +struct pte_masks {
> > +	uint64_t present;
> > +	uint64_t writeable;
> > +	uint64_t user;
> > +	uint64_t accessed;
> > +	uint64_t dirty;
> > +	uint64_t large;
> 
> Ugh, "large".  I vote for "huge" or "hugepage", as that's for more ubiquitous in
> the kernel.

"huge" sounds good.

> 
> > +	uint64_t nx;
> 
> The values themselves should be const, e.g. to communicate that they are fixed
> values.
> 
> > +};
> > +
> > +extern const struct pte_masks x86_pte_masks;
> 
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> > +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> > +		   int level, const struct pte_masks *masks)
> >  {
> >  	const uint64_t pg_size = PG_LEVEL_SIZE(level);
> >  	uint64_t *pte = &vm->pgd;
> > @@ -246,16 +259,16 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> >  	 * early if a hugepage was created.
> >  	 */
> >  	for (current_level = vm->pgtable_levels; current_level > PG_LEVEL_4K; current_level--) {
> > -		pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level);
> > -		if (*pte & PTE_LARGE_MASK)
> > +		pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level, masks);
> > +		if (*pte & masks->large)
> >  			return;
> >  	}
> >  
> >  	/* Fill in page table entry. */
> > -	pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
> > -	TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
> > +	pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K, masks);
> > +	TEST_ASSERT(!(*pte & masks->present),
> 
> I think accessors would help for the "read" cases?  E.g.
> 
> 	TEST_ASSERT(!is_present_pte(mmu, *pte)
> 
> or maybe go with a slightly atypical ordering of:
> 
> 	TEST_ASSERT(!is_present_pte(*pte, mmu),
> 
> The second one seems more readable.

How about is_present_pte(mmu, pte)? Any objections to passing in the
pointer?

> 
> >  		    "PTE already present for 4k page at vaddr: 0x%lx", vaddr);
> > -	*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
> > +	*pte = masks->present | masks->writeable | (paddr & PHYSICAL_PAGE_MASK);
> 
> Hrm.  I don't love the masks->lowercase style, but I don't hate it either.  One
> idea would be to add macros to grab the correct bit, e.g.
> 
> 	*pte = PTE_PRESENT(mmu) | PTE_WRITABLE(mmu) | (paddr & PHYSICAL_PAGE_MASK);

I like this one best, I think. Abstracts the masks away completely,
which is nice.

> 
> Alternatively, assuming we add "struct kvm_mmu", we could grab the "pte_masks"
> structure locally as "m" and do this?  Note sure the super-shorthand is a net
> positive though.
> 
> 	*pte = PTE_PRESENT(m) | PTE_WRITABLE(m) | (paddr & PHYSICAL_PAGE_MASK);
> 
> Or we could YELL REALLY LOUDLY in the fields themselves, e.g.
> 
> 	*pte = m->PRESENT | m->WRITABLE | (paddr & PHYSICAL_PAGE_MASK);
> 
> but that looks kinda weird to me.
> 
> I don't have a super strong preference, though I'm leaning towards accessor
> functions with macros for retrieving the bits.
> 
> >  	/*
> >  	 * Neither SEV nor TDX supports shared page tables, so only the final
> 
> Hiding just out of sight is this code:
> 
> 	/*
> 	 * Neither SEV nor TDX supports shared page tables, so only the final
> 	 * leaf PTE needs manually set the C/S-bit.
> 	 */
> 	if (vm_is_gpa_protected(vm, paddr))
> 		*pte |= vm->arch.c_bit;
> 	else
> 		*pte |= vm->arch.s_bit;
> 
> The C-bit (enCrypted) and S-bit (Shared) values need to be moved into the masks/mmu
> context as well.  In practice, they'll both be zero when creating nested mappings
> since KVM doesn't support nested VMs with encrypted memory, but it's still wrong,
> e.g. the Shared bit doesn't exist in EPT.

Will add, thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ