linux-kernel - Re: [syzbot] [mm?] WARNING in memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkrstjnEVUzz2==A2Z+CJToOgU6YU2MasdK49o-0-jW2yw@mail.gmail.com>
Date: Thu, 25 Sep 2025 09:23:05 -0700
From: Yang Shi <shy828301@...il.com>
To: Zi Yan <ziy@...dia.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>, David Hildenbrand <david@...hat.com>, 
	Luis Chamberlain <mcgrof@...nel.org>, 
	syzbot <syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com>, 
	akpm@...ux-foundation.org, linmiaohe@...wei.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, nao.horiguchi@...il.com, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [mm?] WARNING in memory_failure

On Thu, Sep 25, 2025 at 7:45 AM Zi Yan <ziy@...dia.com> wrote:
>
> On 25 Sep 2025, at 8:02, Pankaj Raghav (Samsung) wrote:
>
> >>>>
> >>>> We might just need (a), since there is no caller of (b) in kernel, except
> >>>> split_folio_to_order() is used for testing. There might be future uses
> >>>> when kernel wants to convert from THP to mTHP, but it seems that we are
> >>>> not there yet.
> >>>>
> >>>
> >>> Even better, then maybe selected interfaces could just fail if the min-order contradicts with the request to split to a non-larger (order-0) folio.
> >>
> >> Yep. Let’s hear what Luis and Pankaj will say about this.
> >>
> >>>
> >>>>
> >>>>
> >>>> +Luis and Pankaj for their opinions on how LBS is going to use split folio
> >>>> to any order.
> >>>>
> >>>> Hi Luis and Pankaj,
> >>>>
> >>>> It seems that bumping split folio order from 0 to mapping_min_folio_order()
> >>>> instead of simply failing the split folio call gives surprises to some
> >>>> callers and causes issues like the one reported by this email. I cannot think
> >>>> of any situation where failing a folio split does not work. If LBS code
> >>>> wants to split, it should supply mapping_min_folio_order(), right? Does
> >>>> such caller exist?
> >>>>
> >
> > I am not aware of any place in the LBS path where we supply the
> > min_order. truncate_inode_partial_folio() calls try_folio_split(), which
> > takes care of splitting in min_order chunks. So we embedded the
> > min_order in the MM functions that performs the split instead of the
> > caller passing the min_order. Probably, that is why this problem is
> > being exposed now where people are surprised by seeing a large folio
> > even though they asked to split folios to order-0.
> >
> > As you concluded, we will not be breaking anything wrt LBS as we
> > just refuse to split if it doesn't match the min_order. The only issue I
> > see is we might be exacerbating ENOMEM errors as we are not splitting as
> > many folios with this change. But the solution for that is simple, add
> > more RAM to the system ;)
> >
> > Just for clarity, are we talking about changing the behaviour just the
> > try_to_split_thp_page() function or all the split functions in huge_mm.h?
>
> I want to change all the split functions in huge_mm.h and provide
> mapping_min_folio_order() to try_folio_split() in truncate_inode_partial_folio().
>
> Something like below:
>
> 1. no split function will change the given order;
> 2. __folio_split() will no longer give VM_WARN_ONCE when provided new_order
> is smaller than mapping_min_folio_order().
>
> In this way, for an LBS folio that cannot be split to order 0, split
> functions will return -EINVAL to tell caller that the folio cannot
> be split. The caller is supposed to handle the split failure.

Other than making folio split more reliable, it seems like to me this
bug report shows memory failure doesn't handle LBS folio properly. For
example, if the block size <= order-0 page size (this should be always
true before LBS), memory failure should expect the large folio is
split to order-0, then the poisoned order-0 page should be discarded
if it is not dirty. The later access to the block will trigger a major
fault.
But with LBS, the block size may be greater than order-0 page size, so
the large folio is actually backed by one single block, so memory
failure should discard the whole large folio instead of one order-0
page in the large folio. IOW, memory failure should expect to see
large folio.

Thanks,
Yang

>
> WDYT?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index f327d62fc985..e15c3ca07e33 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -387,34 +387,16 @@ int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
>   * Return: 0: split is successful, otherwise split failed.
>   */
>  static inline int try_folio_split(struct folio *folio, struct page *page,
> -               struct list_head *list)
> +               struct list_head *list, unsigned int order)
>  {
> -       int ret = min_order_for_split(folio);
> -
> -       if (ret < 0)
> -               return ret;
> -
> -       if (!non_uniform_split_supported(folio, 0, false))
> +       if (!non_uniform_split_supported(folio, order, false))
>                 return split_huge_page_to_list_to_order(&folio->page, list,
> -                               ret);
> -       return folio_split(folio, ret, page, list);
> +                               order);
> +       return folio_split(folio, order, page, list);
>  }
>  static inline int split_huge_page(struct page *page)
>  {
> -       struct folio *folio = page_folio(page);
> -       int ret = min_order_for_split(folio);
> -
> -       if (ret < 0)
> -               return ret;
> -
> -       /*
> -        * split_huge_page() locks the page before splitting and
> -        * expects the same page that has been split to be locked when
> -        * returned. split_folio(page_folio(page)) cannot be used here
> -        * because it converts the page to folio and passes the head
> -        * page to be split.
> -        */
> -       return split_huge_page_to_list_to_order(page, NULL, ret);
> +       return split_huge_page_to_list_to_order(page, NULL, 0);
>  }
>  void deferred_split_folio(struct folio *folio, bool partially_mapped);
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5acca24bbabb..faf5da459a4c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3653,8 +3653,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>
>                 min_order = mapping_min_folio_order(folio->mapping);
>                 if (new_order < min_order) {
> -                       VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
> -                                    min_order);
>                         ret = -EINVAL;
>                         goto out;
>                 }
> @@ -3986,11 +3984,6 @@ int min_order_for_split(struct folio *folio)
>
>  int split_folio_to_list(struct folio *folio, struct list_head *list)
>  {
> -       int ret = min_order_for_split(folio);
> -
> -       if (ret < 0)
> -               return ret;
> -
>         return split_huge_page_to_list_to_order(&folio->page, list, ret);
>  }
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 91eb92a5ce4f..1c15149ae8e9 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>         size_t size = folio_size(folio);
>         unsigned int offset, length;
>         struct page *split_at, *split_at2;
> +       unsigned int min_order;
>
>         if (pos < start)
>                 offset = start - pos;
> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>         if (!folio_test_large(folio))
>                 return true;
>
> +       min_order = mapping_min_folio_order(folio->mapping);
>         split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> -       if (!try_folio_split(folio, split_at, NULL)) {
> +       if (!try_folio_split(folio, split_at, NULL, min_order)) {
>                 /*
>                  * try to split at offset + length to make sure folios within
>                  * the range can be dropped, especially to avoid memory waste
> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>                  */
>                 if (folio_test_large(folio2) &&
>                     folio2->mapping == folio->mapping)
> -                       try_folio_split(folio2, split_at2, NULL);
> +                       try_folio_split(folio2, split_at2, NULL, min_order);
>
>                 folio_unlock(folio2);
>  out:
>
>
> Best Regards,
> Yan, Zi
>