lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Aug 2023 07:26:07 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Yan Zhao <yan.y.zhao@...el.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        pbonzini@...hat.com, mike.kravetz@...cle.com, apopple@...dia.com,
        rppt@...nel.org, akpm@...ux-foundation.org, kevin.tian@...el.com
Subject: Re: [RFC PATCH 3/3] KVM: x86/mmu: skip zap maybe-dma-pinned pages for
 NUMA migration

On Tue, Aug 08, 2023, Jason Gunthorpe wrote:
> On Tue, Aug 08, 2023 at 03:17:02PM +0800, Yan Zhao wrote:
> > @@ -859,6 +860,21 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> >  		    !is_last_spte(iter.old_spte, iter.level))
> >  			continue;
> >  
> > +		if (skip_pinned) {
> > +			kvm_pfn_t pfn = spte_to_pfn(iter.old_spte);
> > +			struct page *page = kvm_pfn_to_refcounted_page(pfn);
> > +			struct folio *folio;
> > +
> > +			if (!page)
> > +				continue;
> > +
> > +			folio = page_folio(page);
> > +
> > +			if (folio_test_anon(folio) && PageAnonExclusive(&folio->page) &&
> > +			    folio_maybe_dma_pinned(folio))
> > +				continue;
> > +		}
> > +
> 
> I don't get it..
> 
> The last patch made it so that the NUMA balancing code doesn't change
> page_maybe_dma_pinned() pages to PROT_NONE
> 
> So why doesn't KVM just check if the current and new SPTE are the same
> and refrain from invalidating if nothing changed?

Because KVM doesn't have visibility into the current and new PTEs when the zapping
occurs.  The contract for invalidate_range_start() requires that KVM drop all
references before returning, and so the zapping occurs before change_pte_range()
or change_huge_pmd() have done antyhing.

> Duplicating the checks here seems very frail to me.

Yes, this is approach gets a hard NAK from me.  IIUC, folio_maybe_dma_pinned()
can yield different results purely based on refcounts, i.e. KVM could skip pages
that the primary MMU does not, and thus violate the mmu_notifier contract.  And
in general, I am steadfastedly against adding any kind of heuristic to KVM's
zapping logic.

This really needs to be fixed in the primary MMU and not require any direct
involvement from secondary MMUs, e.g. the mmu_notifier invalidation itself needs
to be skipped.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ