[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACw3F52D8=s_uj_jRtw1J-GjAvt4c3HNMKb2sJGUjznvyAK80A@mail.gmail.com>
Date: Thu, 15 Jan 2026 09:11:02 -0800
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: Miaohe Lin <linmiaohe@...wei.com>, ziy@...dia.com,
Boudewijn van der Heide <boudewijn@...ta-utec.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, jackmanb@...gle.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.com,
nao.horiguchi@...il.com, osalvador@...e.de, surenb@...gle.com, vbabka@...e.cz
Subject: Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned
compound pages
On Wed, Jan 14, 2026 at 11:55 PM Miaohe Lin <linmiaohe@...wei.com> wrote:
>
> On 2026/1/14 22:48, Boudewijn van der Heide wrote:
> >>> free_pages_prepare() only handles poisoned order-0 pages.
> >>> In memory_failure() (hard offline), pages
> >>> are poisoned before attempting to split huge pages. If the split fails,
> >>> the page remains a compound (order > 0) but is already poisoned. However,
> >>> Soft-offline pages are always poisoned as order-0 after migration, so
> >>> they are unaffected.
> >>>
> >>> The '!order' check causes these poisoned compound pages to skip
> >>> poison handling, leaving them in the buddy allocator.
> >>>
> >>> Worst case, a poisoned compound page could be reallocated,
> >>> potentially leading to crashes, silent data corruption,
> >>> or unwanted memory containment actions before the poison bit is detected.
> >>>
> >>> This patch removes the '&& !order' restriction. Cleanup functions in the
> >>> poison-handling block correctly handle non-zero order pages, making
> >>> this change safe.
> >
> >> This is not a fix. IIUC, for >0 order free pages, memory failure uses
> >> take_page_off_buddy() in a different code path.
> >>
> >
> > Thanks again for the quick response and clarification!
> >>From my understanding,
> > you correctly noted that take_page_off_buddy() handles already-free pages,
> > removing them from the buddy lists and setting SetPageHWPoisonTakenOff().
> > This prevents those pages from re-entering the buddy allocator.
>
> Thanks both.
>
> >
> > My concern is about in-use THP-backed compound pages:
> > 1. A compound page is in use.
> > 2. memory_failure() marks it poisoned (TestSetPageHWPoison).
> > 3. try_to_split_thp_page() fails.
> > 4. The process using the THP may be killed;
> > the page remains compound and poisoned.
> > 5. Later, when the page is finally freed, it reaches free_pages_prepare();
> > 'take_page_off_buddy()' is not invoked in this path.
I agree that Boudewijn's concern is valid when try_to_split_thp_page() fails.
However, I don't think the fix here really works. For a compound / THP
page, memory-failure() sets PG_HWPoison flag on the exact subpage
within the compound page. I believe the page in free_pages_prepare()
is almost going to be (if no always) the head of the compound page. So
removing "!order" won't really help unless the head of the THP page
happens to be HWPoison.
>
> Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
> it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
> verify this.
Yep, I think my work will also help solve the concern when
try_to_split_thp_page() fails.
>
> [1]: https://lore.kernel.org/all/20260112004923.888429-1-jiaqiyan@google.com/
>
> Thanks.
> .
Powered by blists - more mailing lists