[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YPhAEcHOCZ5yII/T@google.com>
Date: Wed, 21 Jul 2021 15:41:05 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Yang Shi <shy828301@...il.com>
Cc: Zi Yan <ziy@...dia.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Huang Ying <ying.huang@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Dan Carpenter <dan.carpenter@...cle.com>,
Mel Gorman <mgorman@...e.de>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Michal Hocko <mhocko@...e.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Paolo Bonzini <pbonzini@...hat.com>,
kvm list <kvm@...r.kernel.org>
Subject: Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB
flushing code
On Tue, Jul 20, 2021, Yang Shi wrote:
> On Tue, Jul 20, 2021 at 2:04 PM Zi Yan <ziy@...dia.com> wrote:
> >
> > On 20 Jul 2021, at 16:53, Yang Shi wrote:
> >
> > > On Tue, Jul 20, 2021 at 7:25 AM Christian Borntraeger
> > > <borntraeger@...ibm.com> wrote:
> > >>> - if (mm_tlb_flush_pending(vma->vm_mm)) {
> > >>> - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
> > >>> - /*
> > >>> - * change_huge_pmd() released the pmd lock before
> > >>> - * invalidating the secondary MMUs sharing the primary
> > >>> - * MMU pagetables (with ->invalidate_range()). The
> > >>> - * mmu_notifier_invalidate_range_end() (which
> > >>> - * internally calls ->invalidate_range()) in
> > >>> - * change_pmd_range() will run after us, so we can't
> > >>> - * rely on it here and we need an explicit invalidate.
> > >>> - */
> > >>> - mmu_notifier_invalidate_range(vma->vm_mm, haddr,
> > >>> - haddr + HPAGE_PMD_SIZE);
> > >>> - }
> > >>> CC Paolo/KVM list so we also remove the mmu notifier here. Do we need those
> > >> now in migrate_pages? I am not an expert in that code, but I cant find
> > >> an equivalent mmu_notifier in migrate_misplaced_pages.
> > >> I might be totally wrong, just something that I noticed.
> > >
> > > Do you mean the missed mmu notifier invalidate for the THP migration
> > > case? Yes, I noticed that too. But I'm not sure whether it is intended
> > > or just missed.
> >
> > From my understand of mmu_notifier document, mmu_notifier_invalidate_range()
> > is needed only if the PTE is updated to point to a new page or the page pointed
> > by the PTE is freed. Page migration does not fall into either case.
The "new page" part of
a page table entry is updated to point to a new page
is referring to a different physical page, i.e. a different pfn, not a different
struct page. do_huge_pmd_numa_page() is moving a THP between nodes, thus it's
changing the backing pfn and needs to invalidate secondary MMUs at some point.
> > In addition, in migrate_pages(), more specifically try_to_migrate_one(),
> > there is a pair of mmu_notifier_invalidate_range_start() and
> > mmu_notifier_invalidate_range_end() around the PTE manipulation code, which should
> > be sufficient to notify secondary TLBs (including KVM) about the PTE change
> > for page migration. Correct me if I am wrong.
>
> Thanks, I think you are correct. By looking into commit 7066f0f933a1
> ("mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()"),
> the tlb flush and mmu notifier invalidate were needed since the old
> numa fault implementation didn't change PTE to migration entry so it
> may cause data corruption due to the writes from GPU secondary MMU.
>
> The refactor does use the generic migration code which converts PTE to
> migration entry before copying data to the new page.
That's my understanding as well, based on this blurb from commit 7066f0f933a1.
The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and
uses the generic migrate_pages which transitions the pte from
numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs
and all mmu notifiers there before copying the page.
That analysis/justification for removing the invalidate_range() call should be
captured in the changelog. Confirmation from Andrea would be a nice bonus.
Powered by blists - more mailing lists