[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e4d50d4-978-ce54-e1ae-40f7117dbf3d@google.com>
Date: Fri, 1 Sep 2023 15:48:26 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
cc: Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Bagas Sanjaya <bagasdotme@...il.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
regressions@...ts.linux.dev
Subject: Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
system stopped booting
On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:
> On Fri, Sep 1, 2023 at 2:08 PM Hugh Dickins <hughd@...gle.com> wrote:
> >
> >
> > Sorry about that, please try this instead, adds EXPORT_SYMBOL(pte_unmap).
> >
>
> Thanks, now I have a working kernel builded at commit a349d72fd9ef.
>
> > I've never used stackdepot before, but I've tried this out in good and
> > bad cases, and expect it to work for you, shedding light on where is
> > going wrong - machine should boot up fine, and in dmesg you'll find one
> > stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
>
> Interesting, I checked twice but I didn't find any entry with
> "pte_map" in the kernel log after applying your patch.
That was very disappointing: I found it hard to explain, but was thinking
of sending you a similar patch, doing the same check on all your 32 CPUs -
maybe the stall being on CPU 0 in your photo was accidental.
But now I think I have the shameful answer (which studying your dmesg,
and the 82328 jiffies at 86 seconds in your photo, did help me towards).
That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a
video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE
from the underflow in __rcu_read_unlock()).
Please revert the debug patch I sent yesterday (or earlier today), please
try booting with this one on top of a349d72fd9ef; and if that's successful,
then please go back to your original Rawhide tree and apply this on top of
that, to confirm that boots to a working system too - thanks.
With my apologies,
[PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()
[ Commit message yet to be written: it's actually something to go to
6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of
no case where it is actually hit. ]
Signed-off-by: Hugh Dickins <hughd@...gle.com>
---
mm/pagewalk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 2022333805d3..9e7d0276c38a 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte = pte_offset_map(pmd, addr);
if (pte) {
err = walk_pte_range_inner(pte, addr, end, walk);
- if (walk->mm != &init_mm)
+ if (walk->mm != &init_mm && addr < TASK_SIZE)
pte_unmap(pte);
}
} else {
--
2.35.3
Powered by blists - more mailing lists