lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQxSSjyPsI0MT8mp@harry>
Date: Thu, 6 Nov 2025 16:53:30 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Jiaqi Yan <jiaqiyan@...gle.com>
Cc: Miaohe Lin <linmiaohe@...wei.com>,
        “William Roche <william.roche@...cle.com>,
        Ackerley Tng <ackerleytng@...gle.com>, jgg@...dia.com,
        akpm@...ux-foundation.org, ankita@...dia.com,
        dave.hansen@...ux.intel.com, david@...hat.com, duenwen@...gle.com,
        jane.chu@...cle.com, jthoughton@...gle.com,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, muchun.song@...ux.dev, nao.horiguchi@...il.com,
        osalvador@...e.de, peterx@...hat.com, rientjes@...gle.com,
        sidhartha.kumar@...cle.com, tony.luck@...el.com,
        wangkefeng.wang@...wei.com, willy@...radead.org, vbabka@...e.cz,
        surenb@...gle.com, mhocko@...e.com, jackmanb@...gle.com,
        hannes@...xchg.org, ziy@...dia.com
Subject: Re: [RFC PATCH v1 0/3] Userspace MFR Policy via memfd

On Mon, Nov 03, 2025 at 08:57:08AM -0800, Jiaqi Yan wrote:
> On Mon, Nov 3, 2025 at 12:53 AM Harry Yoo <harry.yoo@...cle.com> wrote:
> >
> > On Mon, Nov 03, 2025 at 05:16:33PM +0900, Harry Yoo wrote:
> > > On Thu, Oct 30, 2025 at 10:28:48AM -0700, Jiaqi Yan wrote:
> > > > On Thu, Oct 30, 2025 at 4:51 AM Miaohe Lin <linmiaohe@...wei.com> wrote:
> > > > > On 2025/10/28 15:00, Harry Yoo wrote:
> > > > > > On Mon, Oct 27, 2025 at 09:17:31PM -0700, Jiaqi Yan wrote:
> > > > > >> On Wed, Oct 22, 2025 at 6:09 AM Harry Yoo <harry.yoo@...cle.com> wrote:
> > > > > >>> On Mon, Oct 13, 2025 at 03:14:32PM -0700, Jiaqi Yan wrote:
> > > > > >>>> On Fri, Sep 19, 2025 at 8:58 AM “William Roche <william.roche@...cle.com> wrote:
> > > > > >>> But even after fixing that we need to fix the race condition.
> > > > > >>
> > > > > >> What exactly is the race condition you are referring to?
> > > > > >
> > > > > > When you free a high-order page, the buddy allocator doesn't not check
> > > > > > PageHWPoison() on the page and its subpages. It checks PageHWPoison()
> > > > > > only when you free a base (order-0) page, see free_pages_prepare().
> > > > >
> > > > > I think we might could check PageHWPoison() for subpages as what free_page_is_bad()
> > > > > does. If any subpage has HWPoisoned flag set, simply drop the folio. Even we could
> > > >
> > > > Agree, I think as a starter I could try to, for example, let
> > > > free_pages_prepare scan HWPoison-ed subpages if the base page is high
> > > > order. In the optimal case, HugeTLB does move PageHWPoison flag from
> > > > head page to the raw error pages.
> > >
> > > [+Cc page allocator folks]
> > >
> > > AFAICT enabling page sanity check in page alloc/free path would be against
> > > past efforts to reduce sanity check overhead.
> > >
> > > [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/ 
> > > [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/ 
> > > [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz 
> > >
> > > I'd recommend to check hwpoison flag before freeing it to the buddy
> > > when we know a memory error has occurred (I guess that's also what Miaohe
> > > suggested).
> > >
> > > > > do it better -- Split the folio and let healthy subpages join the buddy while reject
> > > > > the hwpoisoned one.
> > > > >
> > > > > >
> > > > > > AFAICT there is nothing that prevents the poisoned page to be
> > > > > > allocated back to users because the buddy doesn't check PageHWPoison()
> > > > > > on allocation as well (by default).
> > > > > >
> > > > > > So rather than freeing the high-order page as-is in
> > > > > > dissolve_free_hugetlb_folio(), I think we have to split it to base pages
> > > > > > and then free them one by one.
> > > > >
> > > > > It might not be worth to do that as this would significantly increase the overhead
> > > > > of the function while memory failure event is really rare.
> > > >
> > > > IIUC, Harry's idea is to do the split in dissolve_free_hugetlb_folio
> > > > only if folio is HWPoison-ed, similar to what Miaohe suggested
> > > > earlier.
> > >
> > > Yes, and if we do the check before moving HWPoison flag to raw pages,
> > > it'll be just a single folio_test_hwpoison() call.
> > >
> > > > BTW, I believe this race condition already exists today when
> > > > memory_failure handles HWPoison-ed free hugetlb page; it is not
> > > > something introduced via this patchset. I will fix or improve this in
> > > > a separate patchset.
> > >
> > > That makes sense.
> >
> > Wait, without this patchset, do we even free the hugetlb folio when
> > its subpage is hwpoisoned? I don't think we do, but I'm not expert at MFR...
> 
> Based on my reading of try_memory_failure_hugetlb, me_huge_page, and
> __page_handle_poison, I think mainline kernel frees dissolved hugetlb
> folio to buddy allocator in two cases:
> 1. it was a free hugetlb page at the moment of try_memory_failure_hugetlb

Right.

> 2. it was an anonomous hugetlb page

Right.

Thanks. I think you're right that poisoned hugetlb folios can be freed
to the buddy even without this series (and poisoned pages allocated back to
users instead of being isolated due to missing PageHWPoison() checks on
alloc/free).

So the plan is to post RFC v2 of this series and the race condition fix
as a separate series, right? (that sounds good to me!)

I still think it'd be best to split the hugetlb folio to order-0 pages and
free them when we know the hugetlb folio is poisoned because:

- We don't have to implement a special version of __free_pages() that
  knows how to handle freeing of a high-order page where its one or more
  sub-pages are poisoned.

- We can avoid re-enabling page sanity checks (and introducing overhead)
  all the time.

-- 
Cheers,
Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ