lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aR-tP8YBftwZA3DD@google.com>
Date: Thu, 20 Nov 2025 16:07:27 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jim Mattson <jmattson@...gle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 12/23] KVM: selftests: Parameterize the PTE bitmasks
 for virt mapping functions

On Tue, Oct 21, 2025, Yosry Ahmed wrote:
> @@ -1367,8 +1357,6 @@ static inline bool kvm_is_ignore_msrs(void)
>  	return get_kvm_param_bool("ignore_msrs");
>  }
>  
> -uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> -				    int *level);
>  uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
>  
>  uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
> @@ -1451,7 +1439,20 @@ enum pg_level {
>  #define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
>  #define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
>  
> -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
> +struct pte_masks {
> +	uint64_t present;
> +	uint64_t writeable;
> +	uint64_t user;
> +	uint64_t accessed;
> +	uint64_t dirty;
> +	uint64_t large;

Ugh, "large".  I vote for "huge" or "hugepage", as that's for more ubiquitous in
the kernel.

> +	uint64_t nx;

The values themselves should be const, e.g. to communicate that they are fixed
values.

> +};
> +
> +extern const struct pte_masks x86_pte_masks;

> -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> +		   int level, const struct pte_masks *masks)
>  {
>  	const uint64_t pg_size = PG_LEVEL_SIZE(level);
>  	uint64_t *pte = &vm->pgd;
> @@ -246,16 +259,16 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
>  	 * early if a hugepage was created.
>  	 */
>  	for (current_level = vm->pgtable_levels; current_level > PG_LEVEL_4K; current_level--) {
> -		pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level);
> -		if (*pte & PTE_LARGE_MASK)
> +		pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level, masks);
> +		if (*pte & masks->large)
>  			return;
>  	}
>  
>  	/* Fill in page table entry. */
> -	pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
> -	TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
> +	pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K, masks);
> +	TEST_ASSERT(!(*pte & masks->present),

I think accessors would help for the "read" cases?  E.g.

	TEST_ASSERT(!is_present_pte(mmu, *pte)

or maybe go with a slightly atypical ordering of:

	TEST_ASSERT(!is_present_pte(*pte, mmu),

The second one seems more readable.

>  		    "PTE already present for 4k page at vaddr: 0x%lx", vaddr);
> -	*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
> +	*pte = masks->present | masks->writeable | (paddr & PHYSICAL_PAGE_MASK);

Hrm.  I don't love the masks->lowercase style, but I don't hate it either.  One
idea would be to add macros to grab the correct bit, e.g.

	*pte = PTE_PRESENT(mmu) | PTE_WRITABLE(mmu) | (paddr & PHYSICAL_PAGE_MASK);

Alternatively, assuming we add "struct kvm_mmu", we could grab the "pte_masks"
structure locally as "m" and do this?  Note sure the super-shorthand is a net
positive though.

	*pte = PTE_PRESENT(m) | PTE_WRITABLE(m) | (paddr & PHYSICAL_PAGE_MASK);

Or we could YELL REALLY LOUDLY in the fields themselves, e.g.

	*pte = m->PRESENT | m->WRITABLE | (paddr & PHYSICAL_PAGE_MASK);

but that looks kinda weird to me.

I don't have a super strong preference, though I'm leaning towards accessor
functions with macros for retrieving the bits.

>  	/*
>  	 * Neither SEV nor TDX supports shared page tables, so only the final

Hiding just out of sight is this code:

	/*
	 * Neither SEV nor TDX supports shared page tables, so only the final
	 * leaf PTE needs manually set the C/S-bit.
	 */
	if (vm_is_gpa_protected(vm, paddr))
		*pte |= vm->arch.c_bit;
	else
		*pte |= vm->arch.s_bit;

The C-bit (enCrypted) and S-bit (Shared) values need to be moved into the masks/mmu
context as well.  In practice, they'll both be zero when creating nested mappings
since KVM doesn't support nested VMs with encrypted memory, but it's still wrong,
e.g. the Shared bit doesn't exist in EPT.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ