[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41a893e1-f2e7-23f4-cad2-d5c353a336a3@redhat.com>
Date: Thu, 10 Aug 2023 11:34:07 +0200
From: David Hildenbrand <david@...hat.com>
To: Yan Zhao <yan.y.zhao@...el.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: pbonzini@...hat.com, seanjc@...gle.com, mike.kravetz@...cle.com,
apopple@...dia.com, jgg@...dia.com, rppt@...nel.org,
akpm@...ux-foundation.org, kevin.tian@...el.com
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a
VM
On 10.08.23 10:56, Yan Zhao wrote:
> This is an RFC series trying to fix the issue of unnecessary NUMA
> protection and TLB-shootdowns found in VMs with assigned devices or VFIO
> mediated devices during NUMA balance.
>
> For VMs with assigned devices or VFIO mediated devices, all or part of
> guest memory are pinned for long-term.
>
> Auto NUMA balancing will periodically selects VMAs of a process and change
> protections to PROT_NONE even though some or all pages in the selected
> ranges are long-term pinned for DMAs, which is true for VMs with assigned
> devices or VFIO mediated devices.
>
> Though this will not cause real problem because NUMA migration will
> ultimately reject migration of those kind of pages and restore those
> PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically
> with equal SPTEs finally faulted back, wasting CPU cycles and generating
> unnecessary TLB-shootdowns.
>
> This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1
> to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that
> the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation
> event is sent for NUMA migration purpose in specific.
>
> Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary
> MMU to avoid NUMA protection introduced page faults and restoration of old
> huge PMDs/PTEs in primary MMU.
>
> Patch 3 introduces a new mmu notifier callback .numa_protect(), which
> will be called in patch 4 when a page is ensured to be PROT_NONE protected.
>
> Then in patch 5, KVM can recognize a .invalidate_range_start() notification
> is for NUMA balancing specific and do not do the page unmap in secondary
> MMU until .numa_protect() comes.
>
Why do we need all that, when we should simply not be applying PROT_NONE
to pinned pages?
In change_pte_range() we already have:
if (is_cow_mapping(vma->vm_flags) &&
page_count(page) != 1)
Which includes both, shared and pinned pages.
Staring at page #2, are we still missing something similar for THPs?
Why is that MMU notifier thingy and touching KVM code required?
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists