linux-kernel - Re: [PATCH v2 15/15] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af69a8359cd5edf892d68764789de7f357c58d5e.camel@intel.com>
Date: Fri, 7 Jun 2024 23:39:14 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "pbonzini@...hat.com" <pbonzini@...hat.com>
CC: "seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai"
	<kai.huang@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Aktas, Erdem"
	<erdemaktas@...gle.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>
Subject: Re: [PATCH v2 15/15] KVM: x86/tdp_mmu: Add a helper function to walk
 down the TDP MMU

On Fri, 2024-06-07 at 11:31 +0200, Paolo Bonzini wrote:
> > -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> > -                        int *root_level)
> > +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64
> > *sptes,
> > +                                 enum kvm_tdp_mmu_root_types root_type)
> >   {
> > -       struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
> > +       struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, root_type);
> 
> I think this function should take the struct kvm_mmu_page * directly.
> 
> > +{
> > +       *root_level = vcpu->arch.mmu->root_role.level;
> > +
> > +       return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, KVM_DIRECT_ROOTS);
> 
> Here you pass root_to_sp(vcpu->arch.mmu->root.hpa);

I see. It is another case of more indirection to try to send the decision making
through the helpers. We can try to open code things more.

> 
> > +int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa,
> > +                                    kvm_pfn_t *pfn)
> > +{
> > +       u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
> > +       int leaf;
> > +
> > +       lockdep_assert_held(&vcpu->kvm->mmu_lock);
> > +
> > +       rcu_read_lock();
> > +       leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, KVM_MIRROR_ROOTS);
> 
> and likewise here.
> 
> You might also consider using a kvm_mmu_root_info for the mirror root,
> even though the pgd field is not used.

This came up on the last version actually. The reason against it was that it
used that tiny bit of extra memory for the pgd. It does look more symmetrical
though.

> 
> Then __kvm_tdp_mmu_get_walk() can take a struct kvm_mmu_root_info * instead.

Ahh, I see. Yes, that's a good reason.

> 
> kvm_tdp_mmu_get_walk_mirror_pfn() doesn't belong in this series, but
> introducing __kvm_tdp_mmu_get_walk() can stay here.

Ok, we can split it.

> 
> Looking at the later patch, which uses
> kvm_tdp_mmu_get_walk_mirror_pfn(), I think this function is a bit
> overkill. I'll do a pre-review of the init_mem_region function,
> especially the usage of kvm_gmem_populate:
> 
> +    slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> +    if (!kvm_slot_can_be_private(slot) || !kvm_mem_is_private(kvm, gfn)) {
> +        ret = -EFAULT;
> +        goto out_put_page;
> +    }
> 
> The slots_lock is taken, so checking kvm_slot_can_be_private is unnecessary.
> 
> Checking kvm_mem_is_private perhaps should also be done in
> kvm_gmem_populate() itself. I'll send a patch.
> 
> +    read_lock(&kvm->mmu_lock);
> +
> +    ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
> +    if (ret < 0)
> +        goto out;
> +    if (ret > PG_LEVEL_4K) {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +    if (mmu_pfn != pfn) {
> +        ret = -EAGAIN;
> +        goto out;
> +    }
> 
> If you require pre-faulting, you don't need to return mmu_pfn and
> things would be seriously wrong if the two didn't match, wouldn't
> they?

Yea, I'm not sure why it would be a normal condition. Maybe Isaku can comment on
the thinking?

>  You are executing with the filemap_invalidate_lock() taken, and
> therefore cannot race with kvm_gmem_punch_hole(). (Soapbox mode on:
> this is the advantage of putting the locking/looping logic in common
> code, kvm_gmem_populate() in this case).
> 
> Therefore, I think kvm_tdp_mmu_get_walk_mirror_pfn() can be replaced just with
> 
> int kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
> {
>   struct kvm *kvm = vcpu->kvm
>   bool is_direct = !kvm_has_mirrored_tdp(...) || (gpa & kvm-
> >arch.direct_mask);
>   hpa_t root = is_direct ? ... : ...;
> 
>   lockdep_assert_held(&vcpu->kvm->mmu_lock);
>   rcu_read_lock();
>   leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root));
>   rcu_read_unlock();
>   if (leaf < 0)
>     return false;
> 
>   spte = sptes[leaf];
>   return is_shadow_present_pte(spte) && is_last_spte(spte, leaf);
> }
> EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped);
> 
> +    while (region.nr_pages) {
> +        if (signal_pending(current)) {
> +            ret = -EINTR;
> +            break;
> +        }
> 
> Checking signal_pending() should be done in kvm_gmem_populate() -
> again, I'll take care of that. The rest of the loop is not necessary -
> just call kvm_gmem_populate() with the whole range and enjoy. You get
> a nice API that is consistent with the intended KVM_PREFAULT_MEMORY
> ioctl, because kvm_gmem_populate() returns the number of pages it has
> processed and you can use that to massage and copy back the struct
> kvm_tdx_init_mem_region.
> 
> +        arg = (struct tdx_gmem_post_populate_arg) {
> +            .vcpu = vcpu,
> +            .flags = cmd->flags,
> +        };
> +        gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
> +                         u64_to_user_ptr(region.source_addr),
> +                         1, tdx_gmem_post_populate, &arg);

Ok thanks for the early comments. We can also drop those pieces as they move
into gmem code.

We were discussing starting to do some early public work on the rest of the MMU
series (that includes this patch) and the user API around VM and vCPU creation.
As in, not have the patches fully ready, but to just work on it in public. This
would probably follow finishing this series up.

It's all tentative, but just to give some idea of where we're at with the rest
of the series.