[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240704043132.28501-1-osalvador@suse.de>
Date: Thu, 4 Jul 2024 06:30:47 +0200
From: Oscar Salvador <osalvador@...e.de>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
Peter Xu <peterx@...hat.com>,
Muchun Song <muchun.song@...ux.dev>,
David Hildenbrand <david@...hat.com>,
SeongJae Park <sj@...nel.org>,
Miaohe Lin <linmiaohe@...wei.com>,
Michal Hocko <mhocko@...e.com>,
Matthew Wilcox <willy@...radead.org>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Oscar Salvador <osalvador@...e.de>
Subject: [PATCH 00/45] hugetlb pagewalk unification
Hi all,
During Peter's talk at the LSFMM, it was agreed that one of the things
that need to be done in order to further integrate hugetlb into mm core,
is to unify generic and hugetlb pagewalkers.
I started with this one, which is unifying hugetlb into generic
pagewalk, instead of having its hugetlb_entry entries.
Which means that pmd_entry/pte_entry(for cont-pte) entries will also deal with
hugetlb vmas as well, and so will new pud_entry entries since hugetlb can be
pud mapped (devm pages as well but we seem not to care about those with
the exception of hmm code).
The outcome is this RFC.
Before you continue, let me clarify certain points:
This patchset is not yet finished, as there are things that 1) need more thought,
2) are still broken (like the hmm bits as I am clueless about that) 3)
some paths have not been tested at all.
The things I tested were:
- memory-failure
- smaps/numa_maps/pagemap (the latter only for pud/pmd, not
cont-{pmd,ptes}
- mempolicy
on arm64 (for 64KB and 32M hugetlb pages) and on x86_64 (for 2MB and 1GB
hugetlb pages).
More tests need to be conducted, and I plan to borrow a pp64le machine
to also carry out some tests there, but for now this is what my bandwith
allowed me to do.
I am well aware that there are two things that might scare people, one
being the number of patches, and the other being the amount of code
added.
For the former, I will by no means ask anyone to review 45 patches, but
since this patchset touches isolated paths (damon, mincore, hmm,
task_mmu, memory-failure, mempolicy), I will point out some people
that might be able to help me out with those different bits:
- Miaohe for memory-failure bits
- David for task_mmu bits
- SeongJae Park for damon bits
- Jerome for hmm bits
- feel freel to join for the rest
I think that that might be a good approach, and instead of having
to review 45 patches, one has only to review at most 5 or 6.
For the latter, there is an explanation: hugetlb operates on ptes
(although it allocates puds/pmds and the operations work on that level too),
which means that now that we will handle PUD/PMD-mapped hugetlb with
{pud,pmd}_* operations, we need to introduce quite a few functions that
do not exist yet and we need from now onwards.
Although I am sending this out, this is not a "rfc ready material",
as I said there are still things that need to be improved/fixed/tested,
but I wanted to make it public nevertheless so we can gather some constructive
feedback that helps us moving in the right direction and to also widen the discussions.
So take this more of a "Hey, let me show what I am doing and call me out on
things you consider wrong".
Thanks in advance
Oscar Salvador (45):
arch/x86: Drop own definition of pgd,p4d_leaf
mm: Add {pmd,pud}_huge_lock helper
mm/pagewalk: Move vma_pgtable_walk_begin and vma_pgtable_walk_end
upfront
mm/pagewalk: Only call pud_entry when we have a pud leaf
mm/pagewalk: Enable walk_pmd_range to handle cont-pmds
mm/pagewalk: Do not try to split non-thp pud or pmd leafs
arch/s390: Enable __s390_enable_skey_pmd to handle hugetlb vmas
fs/proc: Enable smaps_pmd_entry to handle PMD-mapped hugetlb vmas
mm: Implement pud-version functions for swap and vm_normal_page_pud
fs/proc: Create smaps_pud_range to handle PUD-mapped hugetlb vmas
fs/proc: Enable smaps_pte_entry to handle cont-pte mapped hugetlb vmas
fs/proc: Enable pagemap_pmd_range to handle hugetlb vmas
mm: Implement pud-version uffd functions
fs/proc: Create pagemap_pud_range to handle PUD-mapped hugetlb vmas
fs/proc: Adjust pte_to_pagemap_entry for hugetlb vmas
fs/proc: Enable pagemap_scan_pmd_entry to handle hugetlb vmas
mm: Implement pud-version for pud_mkinvalid and pudp_establish
fs/proc: Create pagemap_scan_pud_entry to handle PUD-mapped hugetlb
vmas
fs/proc: Enable gather_pte_stats to handle hugetlb vmas
fs/proc: Enable gather_pte_stats to handle cont-pte mapped hugetlb
vmas
fs/proc: Create gather_pud_stats to handle PUD-mapped hugetlb pages
mm/mempolicy: Enable queue_folios_pmd to handle hugetlb vmas
mm/mempolicy: Create queue_folios_pud to handle PUD-mapped hugetlb
vmas
mm/memory_failure: Enable check_hwpoisoned_pmd_entry to handle hugetlb
vmas
mm/memory-failure: Create check_hwpoisoned_pud_entry to handle
PUD-mapped hugetlb vmas
mm/damon: Enable damon_young_pmd_entry to handle hugetlb vmas
mm/damon: Create damon_young_pud_entry to handle PUD-mapped hugetlb
vmas
mm/damon: Enable damon_mkold_pmd_entry to handle hugetlb vmas
mm/damon: Create damon_mkold_pud_entry to handle PUD-mapped hugetlb
vmas
mm,mincore: Enable mincore_pte_range to handle hugetlb vmas
mm/mincore: Create mincore_pud_range to handle PUD-mapped hugetlb vmas
mm/hmm: Enable hmm_vma_walk_pmd, to handle hugetlb vmas
mm/hmm: Enable hmm_vma_walk_pud to handle PUD-mapped hugetlb vmas
arch/powerpc: Skip hugetlb vmas in subpage_mark_vma_nohuge
arch/s390: Skip hugetlb vmas in thp_split_mm
fs/proc: Make clear_refs_test_walk skip hugetlb vmas
mm/lock: Make mlock_test_walk skip hugetlb vmas
mm/madvise: Make swapin_test_walk skip hugetlb vmas
mm/madvise: Make madvise_cold_test_walk skip hugetlb vmas
mm/madvise: Make madvise_free_test_walk skip hugetlb vmas
mm/migrate_device: Make migrate_vma_test_walk skip hugetlb vmas
mm/memcontrol: Make mem_cgroup_move_test_walk skip hugetlb vmas
mm/memcontrol: Make mem_cgroup_count_test_walk skip hugetlb vmas
mm/hugetlb_vmemmap: Make vmemmap_test_walk skip hugetlb vmas
mm: Delete all hugetlb_entry entries
arch/arm64/include/asm/pgtable.h | 19 +
arch/loongarch/include/asm/pgtable.h | 8 +
arch/mips/include/asm/pgtable.h | 7 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 8 +-
arch/powerpc/mm/book3s64/pgtable.c | 15 +-
arch/powerpc/mm/book3s64/subpage_prot.c | 2 +
arch/riscv/include/asm/pgtable.h | 15 +
arch/s390/mm/gmap.c | 37 +-
arch/x86/include/asm/pgtable.h | 199 +++++----
fs/proc/task_mmu.c | 434 ++++++++++++-------
include/asm-generic/pgtable_uffd.h | 30 ++
include/linux/mm.h | 4 +
include/linux/mm_inline.h | 34 ++
include/linux/pagewalk.h | 10 -
include/linux/pgtable.h | 77 +++-
include/linux/swapops.h | 27 ++
mm/damon/ops-common.c | 21 +-
mm/damon/vaddr.c | 173 ++++----
mm/hmm.c | 69 +--
mm/hugetlb_vmemmap.c | 12 +
mm/madvise.c | 36 ++
mm/memcontrol-v1.c | 24 +
mm/memory-failure.c | 99 +++--
mm/memory.c | 51 +++
mm/mempolicy.c | 121 +++---
mm/migrate_device.c | 12 +
mm/mincore.c | 46 +-
mm/mlock.c | 12 +
mm/mprotect.c | 10 -
mm/pagewalk.c | 73 +---
mm/pgtable-generic.c | 21 +
31 files changed, 1089 insertions(+), 617 deletions(-)
--
2.26.2
Powered by blists - more mailing lists