[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <u5sycz2owaxowcd5mzwqztkkgb2mgt662rm5zllirgxmk6lwex@wyzh7aw44rji>
Date: Sat, 17 May 2025 23:04:46 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Nico Pache <npache@...hat.com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, david@...hat.com, ziy@...dia.com,
lorenzo.stoakes@...cle.com, ryan.roberts@....com, dev.jain@....com,
corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com,
raquini@...hat.com, anshuman.khandual@....com, catalin.marinas@....com,
tiwai@...e.de, will@...nel.org, dave.hansen@...ux.intel.com,
jack@...e.cz, cl@...two.org, jglisse@...gle.com, surenb@...gle.com,
zokeefe@...gle.com, hannes@...xchg.org, rientjes@...gle.com,
mhocko@...e.com, rdunlap@...radead.org
Subject: Re: [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap
for mTHP support
* Nico Pache <npache@...hat.com> [250517 02:48]:
> On Thu, May 15, 2025 at 9:20 PM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/5/15 11:22, Nico Pache wrote:
> > > khugepaged scans anons PMD ranges for potential collapse to a hugepage.
> > > To add mTHP support we use this scan to instead record chunks of utilized
> > > sections of the PMD.
> > >
> > > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
> > > that represents chunks of utilized regions. We can then determine what
> > > mTHP size fits best and in the following patch, we set this bitmap while
> > > scanning the anon PMD. A minimum collapse order of 2 is used as this is
> > > the lowest order supported by anon memory.
> > >
> > > max_ptes_none is used as a scale to determine how "full" an order must
> > > be before being considered for collapse.
> > >
> > > When attempting to collapse an order that has its order set to "always"
> > > lets always collapse to that order in a greedy manner without
> > > considering the number of bits set.
> > >
> > > Signed-off-by: Nico Pache <npache@...hat.com>
> >
> > Sigh. You still haven't addressed or explained the issues I previously
> > raised [1], so I don't know how to review this patch again...
Thanks Baolin for highlighting this.
> Can you still reproduce this issue?
> I can no longer reproduce this issue, that's why I posted... although
> I should have followed up, and looked into what the original issue
> was. Nothing really sticks out so perhaps something in mm-new was
> broken and pulled out... not sure.
This was verified as an issue by you and you didn't mention it in the
cover letter or the patches? It's one thing if you need help testing
changes when you can't recreate the issue, but this is not the case
here. What's worse is that you said you would look into it further, but
then dropped any mention of the issue.
Every comment on your patches should be treated as a bug report and
failure to address them (in code or reply) means your patches should not
be taken. Otherwise things fall through the cracks (and then get picked
up by syzbot, followed by a CVE and a bounty collection - if we are
lucky).
>
> It should now follow the expected behavior, which is that no mTHP
> collapse occurs because if the PMD size is disabled so is khugepaged
> collapse.
>
> Lmk if you are still experiencing this issue please.
>
I'm sorry, but that's not how this process works.
Let us know if it's still an issue and detail the fix and/or cause of
the issue so that we are confident that it's handled diligently, and
know what to check to make sure the fix make sense.
Everyone misses things, but the fact that you are saying you should have
followed up, but are still requesting someone else figure it out is
troubling. It points to an incomplete understanding of the issue(s).
Does it happen in mm-stable or mm-unstable with your patches? Were
there other fixes that went in between the spins that may have affected
it? Please take the time to fully understand what happened and why it
is no longer happening.
If you cannot get to the bottom of it, then don't send out another
revision - follow up to the initial comments and get help to dig deeper.
Maybe there is a bug here that has nothing to do with your patches, but
you have triggered it for some reason and we need more investigation.
I'm not willing to accept patches that now work for an unknown reason,
and neither should you.
> Cheers,
> -- Nico
> >
> > [1]
> > https://lore.kernel.org/all/83a66442-b7c7-42e7-af4e-fd211d8ed6f8@linux.alibaba.com/
> >
>
Thanks,
Liam
Powered by blists - more mailing lists