[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7D_G=5bJe_Uj11sHV2qCyu-Z-3PZu7QuZyPEhuFiE63wQ@mail.gmail.com>
Date: Wed, 22 Oct 2025 13:48:57 +0800
From: Kairui Song <ryncsn@...il.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, Dev Jain <dev.jain@....com>,
David Hildenbrand <david@...hat.com>, Barry Song <baohua@...nel.org>,
Liam Howlett <liam.howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Mariano Pache <npache@...hat.com>, Matthew Wilcox <willy@...radead.org>,
Ryan Roberts <ryan.roberts@....com>, Zi Yan <ziy@...dia.com>, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] mm/shmem: fix THP allocation size check and fallback
On Wed, Oct 22, 2025 at 9:25 AM Baolin Wang
<baolin.wang@...ux.alibaba.com> wrote:
>
>
>
> On 2025/10/22 03:04, Kairui Song wrote:
> > From: Kairui Song <kasong@...cent.com>
> >
> > There are some problems with the code implementations of THP fallback.
> > suitable_orders could be zero, and calling highest_order on a zero value
> > returns an overflowed size. And the order check loop is updating the
> > index value on every loop which may cause the index to be aligned by a
> > larger value while the loop shrinks the order.
>
> No, this is not true. Although ‘suitable_orders’ might be 0, it will not
> enter the ‘while (suitable_orders)’ loop, and ‘order’ will not be used
> (this is also how the highest_order() function is used in other places).
Maybe I shouldn't mix the trivial issue with the real issue here,
sorry, my bad, I was in a hurry :P.
I mean if suitable_orders is zero we should just skip calling the
highest_order since that returns a negative value. It's not causing an
issue though, but redundant.
>
> > And it forgot to try order
> > 0 after the final loop.
>
> This is also not true. We will fallback to order 0 allocation in
> shmem_get_folio_gfp() if large order allocation fails.
I thought after the fix, we can simplify the code, and maybe reduce
the call to shmem_alloc_and_add_folio to only one so it will be
inlined by the compiler.
On second thought some more changes are needed to respect the
huge_gfp. Maybe I should send a series to split the hot fix with clean
ups.
But here the index being modified during the loop do need a fix I
think, so, for the fix part, we just need:
diff --git a/mm/shmem.c b/mm/shmem.c
index 29e1eb690125..e89ae4dd6859 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1895,10 +1895,11 @@ static struct folio
*shmem_alloc_and_add_folio(struct vm_fault *vmf,
order = highest_order(suitable_orders);
while (suitable_orders) {
pages = 1UL << order;
- index = round_down(index, pages);
- folio = shmem_alloc_folio(gfp, order, info, index);
- if (folio)
+ folio = shmem_alloc_folio(gfp, order, info,
round_down(index, pages));
+ if (folio) {
+ index = round_down(index, pages);
goto allocated;
+ }
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
>
> > This is usually fine because shmem_add_to_page_cache ensures the shmem
> > mapping is still sane, but it might cause many potential issues like
> > allocating random folios into the random position in the map or return
> > -ENOMEM by accident. This triggered some strange userspace errors [1],
> > and shouldn't have happened in the first place.
>
> I tested tmpfs with swap on ZRAM in the mm-new branch, and no issues
> were encountered. However, it is worth mentioning that, after the commit
> 69e0a3b49003 ("mm: shmem: fix the strategy for the tmpfs 'huge='
> options"), tmpfs may consume more memory (because we no longer allocate
> large folios based on the write size), which might cause your test case
> to run out of memory (OOM) and trigger some errors. You may need to
> adjust your swap size or memcg limit.
>
I'm not seeing any OOM issue, and I tested with different memory
sizes, the error occurs with all different setup.
If I compared the built object with the ones before 69e0a3b49003, I
can see that the build object is corrupted:
Compare using hexdump & diff on any .o generated by gcc:
--- GOOD 2025-10-21 12:17:44.121000287 +0000
+++ BAD 2025-10-21 12:18:01.094000870 +0000
@@ -3371,425 +3371,7 @@
... <summary: whole chunk of data & symbols missing> ...
GCC compile didn't fail because the problem occurs when writing the
objects, so instead LD failed and not due to OOM.
The reproduction rate is very high with different setup, not sure why
it didn't occur on your setup, I suspect it's related to how different
gcc handles write (not fault)?
Revert 69e0a3b49003, the build is OK. 69e0a3b49003 isn't the root
cause but triggered this.
The minimized fix posted above also fixes the problem just fine.
Powered by blists - more mailing lists