lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACw3F50k11cfh++qnbCnTK9ckc2EoVLzU12bKzCotnV517gLHg@mail.gmail.com>
Date: Mon, 20 Jan 2025 17:21:22 -0800
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: David Hildenbrand <david@...hat.com>
Cc: nao.horiguchi@...il.com, linmiaohe@...wei.com, sidhartha.kumar@...cle.com, 
	muchun.song@...ux.dev, jane.chu@...cle.com, akpm@...ux-foundation.org, 
	osalvador@...e.de, rientjes@...gle.com, jthoughton@...gle.com, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/2] How HugeTLB handle HWPoison page at truncation

On Mon, Jan 20, 2025 at 2:59 AM David Hildenbrand <david@...hat.com> wrote:
>
> On 19.01.25 19:06, Jiaqi Yan wrote:
> > While I was working on userspace MFR via memfd [1], I spend some time to
> > understand what current kernel does when a HugeTLB-backing memfd is
> > truncated. My expectation is, if there is a HWPoison HugeTLB folio
> > mapped via the memfd to userspace, it will be unmapped right away but
> > still be kept in page cache [2]; however when the memfd is truncated to
> > zero or after the memfd is closed, kernel should dissolve the HWPoison
> > folio in the page cache, and free only the clean raw pages to buddy
> > allocator, excluding the poisoned raw page.
> >
> > So I wrote a hugetlb-mfr-base.c selftest and expect
> > 0. say nr_hugepages initially is 64 as system configuration.
> > 1. after MADV_HWPOISON, nr_hugepages should still be 64 as we kept even
> >     HWPoison huge folio in page cache. free_hugepages should be
> >     nr_hugepages minus whatever the amount in use.
> > 2. after truncated memfd to zero, nr_hugepages should reduced to 63 as
> >     kernel dissolved and freed the HWPoison huge folio. free_hugepages
> >     should also be 63.
> >
> > However, when testing at the head of mm-stable commit 2877a83e4a0a
> > ("mm/hugetlb: use folio->lru int demote_free_hugetlb_folios()"), I found
> > although free_hugepages is reduced to 63, nr_hugepages is not reduced
> > and stay at 64.
> >
> > Is my expectation outdated? Or is this some kind of bug?
> >
> > I assume this is a bug and then digged a little bit more. It seems there
> > are two issues, or two things I don't really understand.
> >
> > 1. During try_memory_failure_hugetlb, we should increased the target
> >     in-use folio's refcount via get_hwpoison_hugetlb_folio. However,
> >     until the end of try_memory_failure_hugetlb, this refcout is not put.
> >     I can make sense of this given we keep in-use huge folio in page
> >     cache.
>
> Isn't the general rule that hwpoisoned folios have a raised refcount
> such that they won't get freed + reused? At least that's how the buddy
> deals with them, and I suspect also hugetlb?

Thanks, David.

I see, so it is expected that the _entire_ huge folio will always have
at least a refcount of 1, even when the folio can become "free".

For *free* huge folio, try_memory_failure_hugetlb dissolves it and
frees the clean pages (a lot) to the buddy allocator. This made me
think the same thing will happen for *in-use* huge folio _eventually_
(i.e. somehow the refcount due to HWPoison can be put). I feel this is
a little bit unfortunate for the clean pages, but if it is what it is,
that's fair as it is not a bug.

>
> > [ 1069.320976] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2780000
> > [ 1069.320978] head: order:18 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > [ 1069.320980] flags: 0x400000000100044(referenced|head|hwpoison|node=0|zone=1)
> > [ 1069.320982] page_type: f4(hugetlb)
> > [ 1069.320984] raw: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
> > [ 1069.320985] raw: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
> > [ 1069.320987] head: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
> > [ 1069.320988] head: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
> > [ 1069.320990] head: 0400000000000012 ffffdd53de000001 ffffffffffffffff 0000000000000000
> > [ 1069.320991] head: 0000000000040000 0000000000000000 00000000ffffffff 0000000000000000
> > [ 1069.320992] page dumped because: track hwpoison folio's ref
> >
> > 2. Even if folio's refcount do drop to zero and we get into
> >     free_huge_folio, it is not clear to me which part of free_huge_folio
> >     is handling the case that folio is HWPoison. In my test what I
> >     observed is that evantually the folio is enqueue_hugetlb_folio()-ed.
>
> How would we get a refcount of 0 if we assume the raised refcount on a
> hwpoisoned hugetlb folio?
>
> I'm probably missing something: are you saying that you can trigger a
> hwpoisoned hugetlb folio to get reallocated again, in upstream code?

No, I think it is just my misunderstanding. From what you said, the
expectation of HWPoison hugetlb folio is just it won't get reallocated
again, which is true.

My (wrong) expectation is, in addition to the "won't reallocated
again" part, some (large) portion of the huge folio will be freed to
the buddy allocator. On the other hand, is it something worth having /
improving? (1G - some_single_digit * 4KB) seems to be valuable to the
system, though they are all 4K. #1 and #2 above are then what needs to
be done if the improvement is worth chasing.

>
>
> --
> Cheers,
>
> David / dhildenb
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ