lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Aug 2023 21:58:43 +0800
From:   Chao Gao <chao.gao@...el.com>
To:     Yan Zhao <yan.y.zhao@...el.com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <kvm@...r.kernel.org>, <pbonzini@...hat.com>, <seanjc@...gle.com>,
        <mike.kravetz@...cle.com>, <apopple@...dia.com>, <jgg@...dia.com>,
        <rppt@...nel.org>, <akpm@...ux-foundation.org>,
        <kevin.tian@...el.com>, <david@...hat.com>
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in
 a VM

On Thu, Aug 10, 2023 at 04:56:36PM +0800, Yan Zhao wrote:
>This is an RFC series trying to fix the issue of unnecessary NUMA
>protection and TLB-shootdowns found in VMs with assigned devices or VFIO
>mediated devices during NUMA balance.
>
>For VMs with assigned devices or VFIO mediated devices, all or part of
>guest memory are pinned for long-term.
>
>Auto NUMA balancing will periodically selects VMAs of a process and change
>protections to PROT_NONE even though some or all pages in the selected
>ranges are long-term pinned for DMAs, which is true for VMs with assigned
>devices or VFIO mediated devices.
>
>Though this will not cause real problem because NUMA migration will
>ultimately reject migration of those kind of pages and restore those
>PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically
>with equal SPTEs finally faulted back, wasting CPU cycles and generating
>unnecessary TLB-shootdowns.

In my understanding, NUMA balancing also moves tasks closer to the memory
they are accessing. Can this still work with this series applied?

>
>This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1
>to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that
>the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation
>event is sent for NUMA migration purpose in specific.
>
>Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary
>MMU to avoid NUMA protection introduced page faults and restoration of old
>huge PMDs/PTEs in primary MMU.
>
>Patch 3 introduces a new mmu notifier callback .numa_protect(), which
>will be called in patch 4 when a page is ensured to be PROT_NONE protected.
>
>Then in patch 5, KVM can recognize a .invalidate_range_start() notification
>is for NUMA balancing specific and do not do the page unmap in secondary
>MMU until .numa_protect() comes.
>
>
>Changelog:
>RFC v1 --> v2:
>1. added patch 3-4 to introduce a new callback .numa_protect()
>2. Rather than have KVM duplicate logic to check if a page is pinned for
>long-term, let KVM depend on the new callback .numa_protect() to do the
>page unmap in secondary MMU for NUMA migration purpose.
>
>RFC v1:
>https://lore.kernel.org/all/20230808071329.19995-1-yan.y.zhao@intel.com/ 
>
>Yan Zhao (5):
>  mm/mmu_notifier: introduce a new mmu notifier flag
>    MMU_NOTIFIER_RANGE_NUMA
>  mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate
>    purpose
>  mm/mmu_notifier: introduce a new callback .numa_protect
>  mm/autonuma: call .numa_protect() when page is protected for NUMA
>    migrate
>  KVM: Unmap pages only when it's indeed protected for NUMA migration
>
> include/linux/mmu_notifier.h | 16 ++++++++++++++++
> mm/huge_memory.c             |  6 ++++++
> mm/mmu_notifier.c            | 18 ++++++++++++++++++
> mm/mprotect.c                | 10 +++++++++-
> virt/kvm/kvm_main.c          | 25 ++++++++++++++++++++++---
> 5 files changed, 71 insertions(+), 4 deletions(-)
>
>-- 
>2.17.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ