[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230801124844.278698-4-david@redhat.com>
Date: Tue, 1 Aug 2023 14:48:39 +0200
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
liubo <liubo254@...wei.com>, Peter Xu <peterx@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Hugh Dickins <hughd@...gle.com>,
Jason Gunthorpe <jgg@...pe.ca>,
John Hubbard <jhubbard@...dia.com>,
Mel Gorman <mgorman@...e.de>, Shuah Khan <shuah@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: [PATCH v2 3/8] kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()
KVM is *the* case we know that really wants to honor NUMA hinting falls.
As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set
FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU
to map them into a secondary MMU, and add a comment why.
Do that unconditionally in hva_to_pfn_slow() when calling
get_user_pages_unlocked().
kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and
gfn_to_page_many_atomic() are similarly used to map pages into a
secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always
implicitly honor NUMA hinting faults -- as documented for
FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location
for now.
Don't set it in check_user_page_hwpoison(), where we really only want to
check if the mapped page is HW-poisoned.
We won't set it for other KVM users of get_user_pages()/pin_user_pages()
* arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a
secondary MMU.
* arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace
* arch/s390/kvm/*: s390x only supports a single NUMA node either way
* arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU.
This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer
implicitly be set by get_user_pages() and friends.
Signed-off-by: David Hildenbrand <david@...hat.com>
---
virt/kvm/kvm_main.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfbaafbe3a00..6e4f2b81541e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2517,7 +2517,18 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
bool interruptible, bool *writable, kvm_pfn_t *pfn)
{
- unsigned int flags = FOLL_HWPOISON;
+ /*
+ * When a VCPU accesses a page that is not mapped into the secondary
+ * MMU, we lookup the page using GUP to map it, so the guest VCPU can
+ * make progress. We always want to honor NUMA hinting faults in that
+ * case, because GUP usage corresponds to memory accesses from the VCPU.
+ * Otherwise, we'd not trigger NUMA hinting faults once a page is
+ * mapped into the secondary MMU and gets accessed by a VCPU.
+ *
+ * Note that get_user_page_fast_only() and FOLL_WRITE for now
+ * implicitly honor NUMA hinting faults and don't need this flag.
+ */
+ unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT;
struct page *page;
int npages;
--
2.41.0
Powered by blists - more mailing lists