[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hyrk2lejyxrwcl5oudwkuqch7jwzgo3eoummopordtz5ryhklu@vrq5ratttux7>
Date: Fri, 21 Nov 2025 00:18:07 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jim Mattson <jmattson@...gle.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 12/23] KVM: selftests: Parameterize the PTE bitmasks
for virt mapping functions
On Thu, Nov 20, 2025 at 04:07:27PM -0800, Sean Christopherson wrote:
> On Tue, Oct 21, 2025, Yosry Ahmed wrote:
> > @@ -1367,8 +1357,6 @@ static inline bool kvm_is_ignore_msrs(void)
> > return get_kvm_param_bool("ignore_msrs");
> > }
> >
> > -uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> > - int *level);
> > uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
> >
> > uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
> > @@ -1451,7 +1439,20 @@ enum pg_level {
> > #define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
> > #define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
> >
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
> > +struct pte_masks {
> > + uint64_t present;
> > + uint64_t writeable;
> > + uint64_t user;
> > + uint64_t accessed;
> > + uint64_t dirty;
> > + uint64_t large;
>
> Ugh, "large". I vote for "huge" or "hugepage", as that's for more ubiquitous in
> the kernel.
"huge" sounds good.
>
> > + uint64_t nx;
>
> The values themselves should be const, e.g. to communicate that they are fixed
> values.
>
> > +};
> > +
> > +extern const struct pte_masks x86_pte_masks;
>
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> > +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> > + int level, const struct pte_masks *masks)
> > {
> > const uint64_t pg_size = PG_LEVEL_SIZE(level);
> > uint64_t *pte = &vm->pgd;
> > @@ -246,16 +259,16 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> > * early if a hugepage was created.
> > */
> > for (current_level = vm->pgtable_levels; current_level > PG_LEVEL_4K; current_level--) {
> > - pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level);
> > - if (*pte & PTE_LARGE_MASK)
> > + pte = virt_create_upper_pte(vm, pte, vaddr, paddr, current_level, level, masks);
> > + if (*pte & masks->large)
> > return;
> > }
> >
> > /* Fill in page table entry. */
> > - pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
> > - TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
> > + pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K, masks);
> > + TEST_ASSERT(!(*pte & masks->present),
>
> I think accessors would help for the "read" cases? E.g.
>
> TEST_ASSERT(!is_present_pte(mmu, *pte)
>
> or maybe go with a slightly atypical ordering of:
>
> TEST_ASSERT(!is_present_pte(*pte, mmu),
>
> The second one seems more readable.
How about is_present_pte(mmu, pte)? Any objections to passing in the
pointer?
>
> > "PTE already present for 4k page at vaddr: 0x%lx", vaddr);
> > - *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
> > + *pte = masks->present | masks->writeable | (paddr & PHYSICAL_PAGE_MASK);
>
> Hrm. I don't love the masks->lowercase style, but I don't hate it either. One
> idea would be to add macros to grab the correct bit, e.g.
>
> *pte = PTE_PRESENT(mmu) | PTE_WRITABLE(mmu) | (paddr & PHYSICAL_PAGE_MASK);
I like this one best, I think. Abstracts the masks away completely,
which is nice.
>
> Alternatively, assuming we add "struct kvm_mmu", we could grab the "pte_masks"
> structure locally as "m" and do this? Note sure the super-shorthand is a net
> positive though.
>
> *pte = PTE_PRESENT(m) | PTE_WRITABLE(m) | (paddr & PHYSICAL_PAGE_MASK);
>
> Or we could YELL REALLY LOUDLY in the fields themselves, e.g.
>
> *pte = m->PRESENT | m->WRITABLE | (paddr & PHYSICAL_PAGE_MASK);
>
> but that looks kinda weird to me.
>
> I don't have a super strong preference, though I'm leaning towards accessor
> functions with macros for retrieving the bits.
>
> > /*
> > * Neither SEV nor TDX supports shared page tables, so only the final
>
> Hiding just out of sight is this code:
>
> /*
> * Neither SEV nor TDX supports shared page tables, so only the final
> * leaf PTE needs manually set the C/S-bit.
> */
> if (vm_is_gpa_protected(vm, paddr))
> *pte |= vm->arch.c_bit;
> else
> *pte |= vm->arch.s_bit;
>
> The C-bit (enCrypted) and S-bit (Shared) values need to be moved into the masks/mmu
> context as well. In practice, they'll both be zero when creating nested mappings
> since KVM doesn't support nested VMs with encrypted memory, but it's still wrong,
> e.g. the Shared bit doesn't exist in EPT.
Will add, thanks!
Powered by blists - more mailing lists