linux-kernel - Re: [PATCH v7 040/102] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for private mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e5f3f52ccfe0a0a00f390718df6969615853586.camel@intel.com>
Date:   Wed, 20 Jul 2022 11:17:08 +1200
From:   Kai Huang <kai.huang@...el.com>
To:     Isaku Yamahata <isaku.yamahata@...il.com>
Cc:     isaku.yamahata@...el.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCH v7 040/102] KVM: x86/mmu: Zap only leaf SPTEs for
 deleted/moved memslot for private mmu

On Tue, 2022-07-19 at 04:06 -0700, Isaku Yamahata wrote:
> On Fri, Jul 01, 2022 at 10:41:08PM +1200,
> Kai Huang <kai.huang@...el.com> wrote:
> 
> > On Mon, 2022-06-27 at 14:53 -0700, isaku.yamahata@...el.com wrote:
> > > From: Sean Christopherson <sean.j.christopherson@...el.com>
> > > 
> > > For kvm mmu that has shared bit mask, zap only leaf SPTEs when
> > > deleting/moving a memslot.  The existing kvm_mmu_zap_memslot() depends on
> > 
> > Unless I am mistaken, I don't see there's an 'existing' kvm_mmu_zap_memslot().
> 
> Oops. it should be kvm_tdp_mmu_invalidate_all_roots().
> 
> 
> > > role.invalid with read lock of mmu_lock so that other vcpu can operate on
> > > kvm mmu concurrently. 
> > > 
> > 
> > > Mark the root page table invalid, unlink it from page
> > > table pointer of CPU, process the page table.  
> > > 
> > 
> > Are you talking about the behaviour of existing code, or the change you are
> > going to make?  Looks like you mean the latter but I believe it's the former.
> 
> 
> The existing code.  The should "It marks ...".
> 
> 
> > > It doesn't work for private
> > > page table to unlink the root page table because it requires all SPTE entry
> > > to be non-present. 
> > > 
> > 
> > I don't think we can truly *unlink* the private root page table from secure
> > EPTP, right?  The EPTP (root table) is fixed (and hidden) during TD's runtime.
> > 
> > I guess you are trying to say: removing/unlinking one secure-EPT page requires
> > removing/unlinking all its children first? 
> 
> That's right. I'll update it as follows.
>                           
> 
> > So the reason to only zap leaf is we cannot truly unlink the private root page
> > table, correct?  Sorry your changelog is not obvious to me.
> 
> The reason is, to unlink a page table from the parent's SPTE, all SPTEs of the
> page table need to be non-present.
> 
> Here is the updated commit message.
> 
> KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for private mmu|
> For kvm mmu that has shared bit mask, zap only leaf SPTEs when             |
> deleting/moving a memslot.  The existing kvm_tdp_mmu_invalidate_all_roots()|
> depends on role.invalid with read lock of mmu_lock so that other vcpu can  |
> operate on kvm mmu concurrently.  It marks the root page table invalid,    |
> zaps SPTEs of the root page tables.                                        |
>                                                                            |
> It doesn't work to unlink a private page table from the parent's SPTE entry|
> because it requires all SPTE entry of the page table to be non-present.    |

AFAICT this isn't the real reason that you cannot mark private root table as
invalid, and do the same zapping as you mentioned above.  There might be some
change to support "zapping all children before zapping the parent for private
table" (currently the actual page table is freed after RCU grace period, but not
at unlink time), but I don't see how this cannot be supported, or at least the
changelog doesn't explain why it cannot be supported.

The true reason is, if I understand correctly, you cannot truly unlink the
private root page table from the hardware and then, i.e. allocate a new one for
it.  So just zap the leafs.

> Instead, with write-lock of mmu_lock and zap only leaf SPTEs for kvm mmu   |
> with shared bit mask.  
> 
> > > Instead, with write-lock of mmu_lock and zap only leaf
> > > SPTEs for kvm mmu with shared bit mask.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 34 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 80d7c7709af3..c517c7bca105 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5854,11 +5854,44 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> > >  	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> > >  }
> > >  
> > > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > > +{
> > > +	bool flush = false;
> > > +
> > > +	write_lock(&kvm->mmu_lock);
> > > +
> > > +	/*
> > > +	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> > > +	 * case scenario we'll have unused shadow pages lying around until they
> > > +	 * are recycled due to age or when the VM is destroyed.
> > > +	 */
> > > +	if (is_tdp_mmu_enabled(kvm)) {
> > > +		struct kvm_gfn_range range = {
> > > +		      .slot = slot,
> > > +		      .start = slot->base_gfn,
> > > +		      .end = slot->base_gfn + slot->npages,
> > > +		      .may_block = false,
> > > +		};
> > > +
> > > +		flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
> > 
> > 
> > It appears you only unmap private GFNs (because the base_gfn doesn't have shared
> > bit)?  I think shared mapping in this slot must be zapped too?  
> > 
> > How is this done?  Or the kvm_tdp_mmu_unmap_gfn_range() also zaps shared
> > mappings?
> 
> kvm_tdp_mmu_unmap_gfn_range() handles both private gfn and shared gfn as
> they are alias.  
> 
> 
> > It's hard to review if one patch's behaviour/logic depends on further patches.
> 
> I'll add a comment on the call.
> 

I don't think adding a comment is enough.  The correctness of one patch needs to
depend on future patch doesn't seem right.  Please also consider patch
reorganize/reorder.