lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4z1TRf9jbONe_cmqNQ2_t_vPKfQ4z1aweOcoLtMiosKxg@mail.gmail.com>
Date: Mon, 26 Jan 2026 10:06:35 +0800
From: Barry Song <21cnbao@...il.com>
To: Dev Jain <dev.jain@....com>
Cc: Vernon Yang <vernon2gm@...il.com>, david@...nel.org, 
	Lance Yang <lance.yang@...ux.dev>, lorenzo.stoakes@...cle.com, ziy@...dia.com, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Vernon Yang <yanglincheng@...inos.cn>, akpm@...ux-foundation.org
Subject: Re: [PATCH mm-new v5 4/5] mm: khugepaged: skip lazy-free folios

On Sat, Jan 24, 2026 at 2:48 PM Dev Jain <dev.jain@....com> wrote:
>
>
> On 24/01/26 8:52 am, Vernon Yang wrote:
> > On Sat, Jan 24, 2026 at 12:32 AM Lance Yang <lance.yang@...ux.dev> wrote:
> >> On 2026/1/23 23:08, Vernon Yang wrote:
> >>> On Fri, Jan 23, 2026 at 5:09 PM Lance Yang <lance.yang@...ux.dev> wrote:
> >>>> On 2026/1/23 16:22, Vernon Yang wrote:
> >>>>> From: Vernon Yang <yanglincheng@...inos.cn>
> >>>>>
> >> [...]
> >>
> >>>>> @@ -583,6 +584,11 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >>>>>                folio = page_folio(page);
> >>>>>                VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
> >>>>>
> >>>>> +             if (!pte_dirty(pteval) && folio_test_lazyfree(folio)) {
> >>>> I'm wondering if we need "cc->is_khugepaged &&" as well here?
> >>>>
> >>>> We should allow users to enforce collapse via the madvise_collapse()
> >>>> path even if pages are marked lazyfree, IMHO.
> >>> $ man madvise
> >>> MADV_COLLAPSE
> >>>          Perform a best-effort synchronous collapse of the native pages
> >>>          mapped by the memory range into Transparent Huge Pages (THPs).
> >>>
> >>> The semantics of MADV_COLLAPSE are best-effort and do not imply to enforce
> >>> collapsing, so we don't need "cc->is_khugepaged" here.
> >>>
> >>> We can imagine that if a user simultaneously uses MADV_FREE and
> >>> MADV_COLLAPSE, it indicates a misunderstanding of their semantics.
> >>> As the kernel, we need to safeguard the baseline.
> >> No. Afraid I don't think so.
> >>
> >> To be clear, what I meant by "enforce":
> >>
> >> Yep, MADV_COLLAPSE is best-effort - it can fail. But when users
> >> call MADV_COLLAPSE, they're explicitly asking for collapse.
> >>
> >> Compared to khugepaged just scanning around, that's already "enforce"
> >> - users are actively requesting it, not passively waiting for.
> >>
> >> Note that you're *breaking* userspace. Users would not be able
> >> to collapse the range where there are any lazyfree pages anymore,
> >> even when they explicitly call MADV_COLLAPSE.
> >>
> >> For khugepaged, skipping lazyfree makes sense.
> > I got your meaning, this is equivalent to two questions:
> >
> > 1. Does the semantics of best-effort imply any "enforce" meaning?
> > 2. When madvise(MADV_FREE| MADV_COLLAPSE), do we want to collapse
> >    lazyfree folios?
> >
> > This is a semantic warning, and I'd like to hear others' opinions.
>

That said, it does feel a bit unfortunate. I was wondering whether we
want to give users a hint in this case, e.g. via something like:

pr_warn("Attempt to enforce hugepage collapse on lazyfree memory");

But I'm not sure whether this is actually worth a printk, or if it would
just add noise without providing actionable value.

> Regarding "best-effort", it is best-effort in the sense that, the
> madvise(MADV_COLLAPSE) is a syscall needed not for correctness,
> but for optimization purposes. So it is not the end of the world
> if the syscall fails. But, since the user has decided to do an
> expensive operation (syscall), kernel needs to try harder to
> make sure those CPU cycles weren't a waste.
>

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ