[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SA1PR11MB7130ABC25E060D2CC4749E45891BA@SA1PR11MB7130.namprd11.prod.outlook.com>
Date: Mon, 29 Sep 2025 13:27:51 +0000
From: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
To: Jiaqi Yan <jiaqiyan@...gle.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"david@...hat.com" <david@...hat.com>, "lorenzo.stoakes@...cle.com"
<lorenzo.stoakes@...cle.com>, "linmiaohe@...wei.com" <linmiaohe@...wei.com>,
"Luck, Tony" <tony.luck@...el.com>, "ziy@...dia.com" <ziy@...dia.com>,
"baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
"Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>, "npache@...hat.com"
<npache@...hat.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"dev.jain@....com" <dev.jain@....com>, "baohua@...nel.org"
<baohua@...nel.org>, "nao.horiguchi@...il.com" <nao.horiguchi@...il.com>,
"Chen, Farrah" <farrah.chen@...el.com>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Andrew Zaborowski
<andrew.zaborowski@...el.com>
Subject: RE: [PATCH 1/1] mm: prevent poison consumption when splitting THP
Hi Jiaqi,
> From: Jiaqi Yan <jiaqiyan@...gle.com>
> [...]
> > First Machine Check occurs // [1]
> > memory_failure() // [2]
> > try_to_split_thp_page()
> > split_huge_page()
> > split_huge_page_to_list_to_order()
> > __folio_split() // [3]
> > remap_page()
> > remove_migration_ptes()
> > remove_migration_pte()
> > try_to_map_unused_to_zeropage()
>
> Just an observation: Unfortunately THP only has PageHasHWPoisoned and
> don't know the exact HWPoisoned page. Otherwise, we may still use
> zeropage for these not HWPoisoned.
>
Thanks for catching this.
Miaohe mentioned in another e-mail that there was an HWPoisoned flag for the raw error 4K page.
We could use that flag just to skip that raw error page and still use the zeropage for other
healthy sub-pages. I'll try that.
> > memchr_inv() // [4]
> > Second Machine Check occurs // [5]
> > Kernel panic
> >
> [...]
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -2351,8 +2351,10 @@ int memory_failure(unsigned long pfn, int flags)
> > * otherwise it may race with THP split.
> > * And the flag can't be set in get_hwpoison_page() since
> > * it is called by soft offline too and it is just called
> > - * for !MF_COUNT_INCREASED. So here seems to be the best
> > - * place.
> > + * for !MF_COUNT_INCREASED.
> > + * It also tells split_huge_page() to not bother using
>
> nit: it may confuse readers of split_huge_page when they didn't see any check
> on the hwpoison flag. So from readability PoV, it may be better to refer to this
> in a more generic term like the "following THP splitting process" (I would
> prefer this), or to point precisely to __folio_split.
>
OK. I'll update this comment in v2.
> Everything else looks good to me.
>
> Reviewed-by: Jiaqi Yan <jiaqiyan@...gle.com>
Thanks.
-Qiuxu
Powered by blists - more mailing lists