lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACw3F50E=AZtgfoExCA-nwS6=NYdFFWpf6+GBUYrWiJOz4xwaw@mail.gmail.com>
Date: Mon, 17 Nov 2025 22:24:27 -0800
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: Matthew Wilcox <willy@...radead.org>, Harry Yoo <harry.yoo@...cle.com>, ziy@...dia.com, 
	david@...hat.com, Vlastimil Babka <vbabka@...e.cz>
Cc: nao.horiguchi@...il.com, linmiaohe@...wei.com, lorenzo.stoakes@...cle.com, 
	william.roche@...cle.com, tony.luck@...el.com, wangkefeng.wang@...wei.com, 
	jane.chu@...cle.com, akpm@...ux-foundation.org, osalvador@...e.de, 
	muchun.song@...ux.dev, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, Michal Hocko <mhocko@...e.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Brendan Jackman <jackmanb@...gle.com>, 
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH v1 1/2] mm/huge_memory: introduce uniform_split_unmapped_folio_to_zero_order

On Mon, Nov 17, 2025 at 5:43 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Mon, Nov 17, 2025 at 12:15:23PM +0900, Harry Yoo wrote:
> > On Sun, Nov 16, 2025 at 11:51:14AM +0000, Matthew Wilcox wrote:
> > > But since we're only doing this on free, we won't need to do folio
> > > allocations at all; we'll just be able to release the good pages to the
> > > page allocator and sequester the hwpoison pages.
> >
> > [+Cc PAGE ALLOCATOR folks]
> >
> > So we need an interface to free only healthy portion of a hwpoison folio.

+1, with some of my own thoughts below.

> >
> > I think a proper approach to this should be to "free a hwpoison folio
> > just like freeing a normal folio via folio_put() or free_frozen_pages(),
> > then the page allocator will add only healthy pages to the freelist and
> > isolate the hwpoison pages". Oherwise we'll end up open coding a lot,
> > which is too fragile.
>
> Yes, I think it should be handled by the page allocator.  There may be

I agree with Matthew, Harry, and David. The page allocator seems best
suited to handle HWPoison subpages without any new folio allocations.

> some complexity to this that I've missed, eg if hugetlb wants to retain
> the good 2MB chunks of a 1GB allocation.  I'm not sure that's a useful
> thing to do or not.
>
> > In fact, that can be done by teaching free_pages_prepare() how to handle
> > the case where one or more subpages of a folio are hwpoison pages.
> >
> > How this should be implemented in the page allocator in memdescs world?
> > Hmm, we'll want to do some kind of non-uniform split, without actually
> > splitting the folio but allocating struct buddy?
>
> Let me sketch that out, realising that it's subject to change.
>
> A page in buddy state can't need a memdesc allocated.  Otherwise we're
> allocating memory to free memory, and that way lies madness.  We can't
> do the hack of "embed struct buddy in the page that we're freeing"
> because HIGHMEM.  So we'll never shrink struct page smaller than struct
> buddy (which is fine because I've laid out how to get to a 64 bit struct
> buddy, and we're probably two years from getting there anyway).
>
> My design for handling hwpoison is that we do allocate a struct hwpoison
> for a page.  It looks like this (for now, in my head):
>
> struct hwpoison {
>         memdesc_t original;
>         ... other things ...
> };
>
> So we can replace the memdesc in a page with a hwpoison memdesc when we
> encounter the error.  We still need a folio flag to indicate that "this
> folio contains a page with hwpoison".  I haven't put much thought yet
> into interaction with HUGETLB_PAGE_OPTIMIZE_VMEMMAP; maybe "other things"
> includes an index of where the actually poisoned page is in the folio,
> so it doesn't matter if the pages alias with each other as we can recover
> the information when it becomes useful to do so.
>
> > But... for now I think hiding this complexity inside the page allocator
> > is good enough. For now this would just mean splitting a frozen page

I want to add one more thing. For HugeTLB, kernel clears the HWPoison
flag on the folio and move it to every raw pages in raw_hwp_page list
(see folio_clear_hugetlb_hwpoison). So page allocator has no hint that
some pages passed into free_frozen_pages has HWPoison. It has to
traverse 2^order pages to tell, if I am not mistaken, which goes
against the past effort to reduce sanity checks. I believe this is one
reason I choosed to handle the problem in hugetlb / memory-failure.

For the new interface Harry requested, is it the caller's
responsibility to ensure that the folio contains HWPoison pages (to be
even better, maybe point out the exact ones?), so that page allocator
at least doesn't waste cycles to search non-exist HWPoison in the set
of pages?

Or caller and page allocator need to agree on some contract? Say
caller has to set has_hwpoisoned flag in non-zero order folio to free.
This allows the old interface free_frozen_pages an easy way using the
has_hwpoison flag from the second page. I know has_hwpoison is "#if
defined" on THP and using it for hugetlb probably is not very clean,
but are there other concerns?


> > inside the page allocator (probably non-uniform?). We can later re-implement
> > this to provide better support for memdescs.
>
> Yes, I like this approach.  But then I'm not the page allocator
> maintainer ;-)

If page allocator maintainers can weigh in here, that will be very helpful!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ