[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250926093343.1000-7-laoar.shao@gmail.com>
Date: Fri, 26 Sep 2025 17:33:37 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: akpm@...ux-foundation.org,
david@...hat.com,
ziy@...dia.com,
baolin.wang@...ux.alibaba.com,
lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com,
npache@...hat.com,
ryan.roberts@....com,
dev.jain@....com,
hannes@...xchg.org,
usamaarif642@...il.com,
gutierrez.asier@...wei-partners.com,
willy@...radead.org,
ast@...nel.org,
daniel@...earbox.net,
andrii@...nel.org,
ameryhung@...il.com,
rientjes@...gle.com,
corbet@....net,
21cnbao@...il.com,
shakeel.butt@...ux.dev,
tj@...nel.org,
lance.yang@...ux.dev
Cc: bpf@...r.kernel.org,
linux-mm@...ck.org,
linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org,
Yafang Shao <laoar.shao@...il.com>
Subject: [PATCH v8 mm-new 06/12] mm: thp: enable THP allocation exclusively through khugepaged
khugepaged_enter_vma() ultimately invokes any attached BPF function with
the TVA_KHUGEPAGED flag set when determining whether or not to enable
khugepaged THP for a freshly faulted in VMA.
Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as
invoked by create_huge_pmd() and only when we have already checked to
see if an allowable TVA_PAGEFAULT order is specified.
Since we might want to disallow THP on fault-in but allow it via
khugepaged, we move things around so we always attempt to enter
khugepaged upon fault.
This change is safe because:
- the checks for thp_vma_allowable_order(TVA_KHUGEPAGED) and
thp_vma_allowable_order(TVA_PAGEFAULT) are functionally equivalent
- khugepaged operates at the MM level rather than per-VMA. The THP
allocation might fail during page faults due to transient conditions
(e.g., memory pressure), it is safe to add this MM to khugepaged for
subsequent defragmentation.
While we could also extend prctl() to utilize this new policy, such a
change would require a uAPI modification to PR_SET_THP_DISABLE.
Signed-off-by: Yafang Shao <laoar.shao@...il.com>
Acked-by: Lance Yang <lance.yang@...ux.dev>
---
mm/huge_memory.c | 1 -
mm/memory.c | 13 ++++++++-----
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 08372dfcb41a..2b155a734c78 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
ret = vmf_anon_prepare(vmf);
if (ret)
return ret;
- khugepaged_enter_vma(vma);
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm) &&
diff --git a/mm/memory.c b/mm/memory.c
index 58ea0f93f79e..64f91191ffff 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6327,11 +6327,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
if (pud_trans_unstable(vmf.pud))
goto retry_pud;
- if (pmd_none(*vmf.pmd) &&
- thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) {
- ret = create_huge_pmd(&vmf);
- if (!(ret & VM_FAULT_FALLBACK))
- return ret;
+ if (pmd_none(*vmf.pmd)) {
+ if (vma_is_anonymous(vma))
+ khugepaged_enter_vma(vma);
+ if (thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) {
+ ret = create_huge_pmd(&vmf);
+ if (!(ret & VM_FAULT_FALLBACK))
+ return ret;
+ }
} else {
vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
--
2.47.3
Powered by blists - more mailing lists