[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231013001203.GA3812@monkey>
Date: Thu, 12 Oct 2023 17:12:03 -0700
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Nathan Chancellor <nathan@...nel.org>
Cc: Usama Arif <usama.arif@...edance.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
muchun.song@...ux.dev, songmuchun@...edance.com,
fam.zheng@...edance.com, liangma@...ngbit.com,
punit.agrawal@...edance.com,
Konrad Dybcio <konrad.dybcio@...aro.org>, llvm@...ts.linux.dev
Subject: Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for
non-gigantic pages
On 10/12/23 07:53, Mike Kravetz wrote:
> On 10/11/23 17:03, Nathan Chancellor wrote:
> > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > > On 10/09/23 15:56, Usama Arif wrote:
> >
> > I suspect the crash that our continuous integration spotted [1] is the
> > same issue that Konrad is seeing, as I have bisected that failure to
> > bfb41d6b2fe1 in next-20231009. However, neither the first half of your
> > diff (since the second half does not apply at bfb41d6b2fe1) nor the
> > original patch in this thread resolves the issue though, so maybe it is
> > entirely different from Konrad's?
> >
> > For what it's worth, this issue is only visible for me when building for
> > arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default
> > CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?),
> > making it seem like it could be something with uninitialized memory... I
> > have not been able to reproduce it with GCC, which could also mean
> > something.
>
> Thank you Nathan! That is very helpful.
>
> I will use this information to try and recreate. If I can recreate, I
> should be able to get to root cause.
I could easily recreate the issue using the provided instructions. First
thing I did was add a few printk's to check/verify state. The beginning
of gather_bootmem_prealloc looked like this:
static void __init gather_bootmem_prealloc(void)
{
LIST_HEAD(folio_list);
struct huge_bootmem_page *m;
struct hstate *h, *prev_h = NULL;
if (list_empty(&huge_boot_pages))
printk("gather_bootmem_prealloc: huge_boot_pages list empty\n");
list_for_each_entry(m, &huge_boot_pages, list) {
struct page *page = virt_to_page(m);
struct folio *folio = (void *)page;
printk("gather_bootmem_prealloc: loop entry m %lx\n",
(unsigned long)m);
The STRANGE thing is that the printk after testing for list_empty would
print, then we would enter the 'list_for_each_entry()' loop as if the list
was not empty. This is the cause of the addressing exception. m pointed
to the list head as opposed to an entry on the list.
I have attached disassembly of gather_bootmem_prealloc with INIT_STACK_NONE
and INIT_STACK_ALL_ZERO. disassembly listings are for code without
printks.
This is the first time I have looked at arm assembly, so I may be missing
something. However, in the INIT_STACK_NONE case it looks like we get the
address of huge_boot_pages into a register but do not use it to determine
if we should execute the loop. Code generated with INIT_STACK_ALL_ZERO seems
to show code checking the list before entering the loop.
Can someone with more arm assembly experience take a quick look? Since
huge_boot_pages is a global variable rather than on the stack, I can't
see how INIT_STACK_ALL_ZERO/INIT_STACK_NONE could make a difference.
--
Mike Kravetz
View attachment "disass_INIT_STACK_NONE" of type "text/plain" (9882 bytes)
View attachment "disass_INIT_STACK_ALL_ZERO" of type "text/plain" (10137 bytes)
Powered by blists - more mailing lists