[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240703212918.2417843-3-peterx@redhat.com>
Date: Wed, 3 Jul 2024 17:29:12 -0400
From: Peter Xu <peterx@...hat.com>
To: linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Dave Jiang <dave.jiang@...el.com>,
linuxppc-dev@...ts.ozlabs.org,
Michael Ellerman <mpe@...erman.id.au>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Andrew Morton <akpm@...ux-foundation.org>,
Huang Ying <ying.huang@...el.com>,
Oscar Salvador <osalvador@...e.de>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org,
Ingo Molnar <mingo@...hat.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Dan Williams <dan.j.williams@...el.com>,
Borislav Petkov <bp@...en8.de>,
peterx@...hat.com,
Hugh Dickins <hughd@...gle.com>,
Rick P Edgecombe <rick.p.edgecombe@...el.com>,
Alex Thorlton <athorlton@....com>
Subject: [PATCH v2 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES
In 2013, commit 72403b4a0fbd ("mm: numa: return the number of base pages
altered by protection changes") introduced "numa_huge_pte_updates" vmstat
entry, trying to capture how many huge ptes (in reality, PMD thps at that
time) are marked by NUMA balancing.
This patch proposes to remove it for some reasons.
Firstly, the name is misleading. We can have more than one way to have a
"huge pte" at least nowadays, and that's also the major goal of this patch,
where it paves way for PUD handling in change protection code paths.
PUDs are coming not only for dax (which has already came and yet broken..),
but also for pfnmaps and hugetlb pages. The name will simply stop making
sense when PUD will start to be involved in mprotect() world.
It'll also make it not reasonable either if we boost the counter for both
pmd/puds. In short, current accounting won't be right when PUD comes, so
the scheme was only suitable at that point in time where PUD wasn't even
possible.
Secondly, the accounting was simply not right from the start as long as it
was also affected by other call sites besides NUMA. mprotect() is one,
while userfaultfd-wp also leverages change protection path to modify
pgtables. If it wants to do right it needs to check the caller but it
never did; at least mprotect() should be there even in 2013.
It gives me the impression that nobody is seriously using this field, and
it's also impossible to be serious.
We may want to do it right if any NUMA developers would like it to exist,
but we should do that with all above resolved, on both considering PUDs,
but also on correct accountings. That should be able to be done on top
when there's a real need of such.
Cc: Huang Ying <ying.huang@...el.com>
Cc: Mel Gorman <mgorman@...hsingularity.net>
Cc: Alex Thorlton <athorlton@....com>
Cc: Rik van Riel <riel@...riel.com>
Signed-off-by: Peter Xu <peterx@...hat.com>
---
include/linux/vm_event_item.h | 1 -
mm/mprotect.c | 8 +-------
mm/vmstat.c | 1 -
3 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 747943bc8cc2..2a3797fb6742 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -59,7 +59,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
OOM_KILL,
#ifdef CONFIG_NUMA_BALANCING
NUMA_PTE_UPDATES,
- NUMA_HUGE_PTE_UPDATES,
NUMA_HINT_FAULTS,
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 222ab434da54..21172272695e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -363,7 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
pmd_t *pmd;
unsigned long next;
long pages = 0;
- unsigned long nr_huge_updates = 0;
struct mmu_notifier_range range;
range.start = 0;
@@ -411,11 +410,8 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
ret = change_huge_pmd(tlb, vma, pmd,
addr, newprot, cp_flags);
if (ret) {
- if (ret == HPAGE_PMD_NR) {
+ if (ret == HPAGE_PMD_NR)
pages += HPAGE_PMD_NR;
- nr_huge_updates++;
- }
-
/* huge pmd was handled */
goto next;
}
@@ -435,8 +431,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
if (range.start)
mmu_notifier_invalidate_range_end(&range);
- if (nr_huge_updates)
- count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
return pages;
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 73d791d1caad..53656227f70d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1313,7 +1313,6 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
"numa_pte_updates",
- "numa_huge_pte_updates",
"numa_hint_faults",
"numa_hint_faults_local",
"numa_pages_migrated",
--
2.45.0
Powered by blists - more mailing lists