[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez2oge4xs1pSz_T9L46g=wQnFyC63kQKsXwbHGRWAxQ+aw@mail.gmail.com>
Date: Wed, 30 Apr 2025 22:51:46 +0200
From: Jann Horn <jannh@...gle.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
akpm@...ux-foundation.org, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com, david@...hat.com,
baohua@...nel.org, baolin.wang@...ux.alibaba.com, ryan.roberts@....com,
willy@...radead.org, peterx@...hat.com, ziy@...dia.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com, sunnanyong@...wei.com,
vishal.moola@...il.com, thomas.hellstrom@...ux.intel.com,
yang@...amperecomputing.com, kirill.shutemov@...ux.intel.com,
aarcange@...hat.com, raquini@...hat.com, dev.jain@....com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com, hannes@...xchg.org,
rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com
Subject: Re: [PATCH v5 07/12] khugepaged: add mTHP support
On Mon, Apr 28, 2025 at 8:12 PM Nico Pache <npache@...hat.com> wrote:
> Introduce the ability for khugepaged to collapse to different mTHP sizes.
> While scanning PMD ranges for potential collapse candidates, keep track
> of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit
> represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If
> mTHPs are enabled we remove the restriction of max_ptes_none during the
> scan phase so we dont bailout early and miss potential mTHP candidates.
>
> After the scan is complete we will perform binary recursion on the
> bitmap to determine which mTHP size would be most efficient to collapse
> to. max_ptes_none will be scaled by the attempted collapse order to
> determine how full a THP must be to be eligible.
>
> If a mTHP collapse is attempted, but contains swapped out, or shared
> pages, we dont perform the collapse.
[...]
> @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
> vma_start_write(vma);
> anon_vma_lock_write(vma->anon_vma);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
> - address + HPAGE_PMD_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address,
> + _address + (PAGE_SIZE << order));
> mmu_notifier_invalidate_range_start(&range);
>
> pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
> +
> /*
> * This removes any huge TLB entry from the CPU so we won't allow
> * huge and small TLB entries for the same virtual address to
It's not visible in this diff, but we're about to do a
pmdp_collapse_flush() here. pmdp_collapse_flush() tears down the
entire page table, meaning it tears down 2MiB of address space; and it
assumes that the entire page table exclusively corresponds to the
current VMA.
I think you'll need to ensure that the pmdp_collapse_flush() only
happens for full-size THP, and that mTHP only tears down individual
PTEs in the relevant range. (That code might get a bit messy, since
the existing THP code tears down PTEs in a detached page table, while
mTHP would have to do it in a still-attached page table.)
Powered by blists - more mailing lists