lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e1fef74-f369-439e-83ff-c50f991c834e@lucifer.local>
Date: Fri, 12 Sep 2025 14:35:44 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        david@...hat.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
        Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com,
        corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
        mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
        baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
        wangkefeng.wang@...wei.com, usamaarif642@...il.com,
        sunnanyong@...wei.com, vishal.moola@...il.com,
        thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
        kas@...nel.org, aarcange@...hat.com, raquini@...hat.com,
        anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
        will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
        cl@...two.org, jglisse@...gle.com, surenb@...gle.com,
        zokeefe@...gle.com, hannes@...xchg.org, rientjes@...gle.com,
        mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com,
        richard.weiyang@...il.com, lance.yang@...ux.dev, vbabka@...e.cz,
        rppt@...nel.org, jannh@...gle.com, pfalcato@...e.de
Subject: Re: [PATCH v11 06/15] khugepaged: introduce collapse_max_ptes_none
 helper function

On Thu, Sep 11, 2025 at 09:28:01PM -0600, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
>
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
>
> To fix this issue introduce a helper function that caps the max_ptes_none
> to HPAGE_PMD_NR / 2 - 1 (255 on 4k page size). The function also scales
> the max_ptes_none number by the (PMD_ORDER - target collapse order).

I would say very clearly that this is only in the mTHP case.


>
> Signed-off-by: Nico Pache <npache@...hat.com>

Hmm I thought we were going to wait for David to investigate different
approaches to this?

This is another issue with quickly going to another iteration. Though I do think
David explicitly said he'd come back with a solution?

So I'm not sure why we're seeing this solution here? Unless I'm missing
something?

> ---
>  mm/khugepaged.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b0ae0b63fc9b..4587f2def5c1 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -468,6 +468,26 @@ void __khugepaged_enter(struct mm_struct *mm)
>  		wake_up_interruptible(&khugepaged_wait);
>  }
>
> +/* Returns the scaled max_ptes_none for a given order.

We don't start comments at the /*, please use a normal comment format like:

/*
 * xxxx
 */

> + * Caps the value to HPAGE_PMD_NR/2 - 1 in the case of mTHP collapse to prevent

This is super unclear.

It start with 'caps the xxx' which seems like you're talking generally.

You should say very clearly 'For PMD allocations we apply the
khugepaged_max_ptes_none parameter as normal. For mTHP ... [details about mTHP].

> + * a feedback loop. If max_ptes_none is greater than HPAGE_PMD_NR/2, the value
> + * would lead to collapses that introduces 2x more pages than the original
> + * number of pages. On subsequent scans, the max_ptes_none check would be
> + * satisfied and the collapses would continue until the largest order is reached
> + */

This is a super vauge explanation. Please describe the issue with creep more
clearly.

Also aren't we saying that 511 or 0 are the sensible choices? But now somehow
that's not the case?

You're also not giving a kdoc info on what this returns.

> +static int collapse_max_ptes_none(unsigned int order)

It's a problem that existed already, but khugepaged_max_ptes_none is an unsigned
int and this returns int.

Maybe we should fix this while we're at it...

> +{
> +	int max_ptes_none;
> +
> +	if (order != HPAGE_PMD_ORDER &&
> +	    khugepaged_max_ptes_none >= HPAGE_PMD_NR/2)
> +		max_ptes_none = HPAGE_PMD_NR/2 - 1;
> +	else
> +		max_ptes_none = khugepaged_max_ptes_none;
> +	return max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +
> +}
> +

I really don't like this formulation, you're making it unnecessarily unclear and
now, for the super common case of PMD size, you have to figure out 'oh it's this
second branch and we're subtracting HPAGE_PMD_ORDER from HPAGE_PMD_ORDER so just
return khugepaged_max_ptes_none'. When we could... just return it no?

So something like:

#define MAX_PTES_NONE_MTHP_CAP (HPAGE_PMD_NR / 2 - 1)

static unsigned int collapse_max_ptes_none(unsigned int order)
{
	unsigned int max_ptes_none_pmd;

	/* PMD-sized THPs behave precisely the same as before. */
	if (order == HPAGE_PMD_ORDER)
		return khugepaged_max_ptes_none;

	/*
	* Bizarrely, this is expressed in terms of PTEs were this PMD-sized.
	* For the reasons stated above, we cap this value in the case of mTHP.
	*/
	max_ptes_none_pmd = MIN(MAX_PTES_NONE_MTHP_CAP,
		khugepaged_max_ptes_none);

	/* Apply PMD -> mTHP scaling. */
	return max_ptes_none >> (HPAGE_PMD_ORDER - order);
}

>  void khugepaged_enter_vma(struct vm_area_struct *vma,
>  			  vm_flags_t vm_flags)
>  {
> @@ -554,7 +574,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	struct folio *folio = NULL;
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> -	int scaled_max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +	int scaled_max_ptes_none = collapse_max_ptes_none(order);
>  	const unsigned long nr_pages = 1UL << order;
>
>  	for (_pte = pte; _pte < pte + nr_pages;
> --
> 2.51.0
>

Thanks, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ