[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHVXubhMLgb54_7zV2yFuGPoMKCkUXwozHbDvghc7kQqNLK-JA@mail.gmail.com>
Date: Wed, 17 May 2023 16:55:53 +0200
From: Alexandre Ghiti <alexghiti@...osinc.com>
To: Conor Dooley <conor.dooley@...rochip.com>
Cc: Song Shuai <suagrfillet@...il.com>, robh@...nel.org,
Andrew Jones <ajones@...tanamicro.com>, anup@...infault.org,
palmer@...osinc.com, jeeheng.sia@...rfivetech.com,
leyfoon.tan@...rfivetech.com, mason.huo@...rfivetech.com,
Paul Walmsley <paul.walmsley@...ive.com>,
Guo Ren <guoren@...nel.org>, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: Bug report: kernel paniced when system hibernates
On Wed, May 17, 2023 at 1:28 PM Conor Dooley <conor.dooley@...rochip.com> wrote:
>
> Hey Alex,
>
> On Wed, May 17, 2023 at 10:58:02AM +0200, Alexandre Ghiti wrote:
> > On Tue, May 16, 2023 at 1:12 PM Alexandre Ghiti <alexghiti@...osinc.com> wrote:
>
> > > On Tue, May 16, 2023 at 11:24 AM Song Shuai <suagrfillet@...il.com> wrote:
> > > I actually removed this flag a few years ago, and I have to admit that
> > > I need to check if that's necessary: the goal of commit 3335068f8721
> > > ("riscv: Use PUD/P4D/PGD pages for the linear mapping") is to expose
> > > the "right" start of DRAM so that we can align virtual and physical
> > > addresses on a 1GB boundary.
> > >
> > > So I have to check if a nomap region is actually added as a
> > > memblock.memory.regions[] or not: if yes, that's perfect, let's add
> > > the nomap attributes to the PMP regions, otherwise, I don't think that
> > > is a good solution.
> >
> > So here is the current linear mapping without nomap in openSBI:
> >
> > ---[ Linear mapping ]---
> > 0xff60000000000000-0xff60000000200000 0x0000000080000000 2M
> > PMD D A G . . W R V
> > 0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
> > PMD D A G . . . R V
> >
> > And below the linear mapping with nomap in openSBI:
> >
> > ---[ Linear mapping ]---
> > 0xff60000000080000-0xff60000000200000 0x0000000080080000 1536K
> > PTE D A G . . W R V
> > 0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
> > PMD D A G . . . R V
> >
> > So adding nomap does not misalign virtual and physical addresses, it
> > prevents the usage of 1GB page for this area though, so that's a
> > solution, we just lose this 1GB page here.
> >
> > But even though that may be the fix, I think we also need to fix that
> > in the kernel as it would break compatibility with certain versions of
> > openSBI *if* we fix openSBI...So here are a few solutions:
> >
> > 1. we can mark all "mmode_resv" nodes in the device tree as nomap,
> > before the linear mapping is established (IIUC, those nodes are added
> > by openSBI to advertise PMP regions)
> > -> This amounts to the same fix as opensbi and we lose the 1GB hugepage.
>
> AFAIU, losing the 1 GB hugepage is a regression, which would make this
> not an option, right?
Not sure this is a real regression, I'd rather avoid it, but as
mentioned in my first answer, Mike Rapoport showed that it was making
no difference performance-wise...
>
> > 2. we can tweak pfn_is_nosave function to *not* save pfn corresponding
> > to PMP regions
> > -> We don't lose the 1GB hugepage \o/
> > 3. we can use register_nosave_region() to not save the "mmode_resv"
> > regions (x86 does that
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/arch/x86/kernel/e820.c#L753)
> > -> We don't lose the 1GB hugepage \o/
> > 4. Given JeeHeng pointer to
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/kernel/power/snapshot.c#L1340,
> > we can mark those pages as non-readable and make the hibernation
> > process not save those pages
> > -> Very late-in-the-day idea, not sure what it's worth, we also
> > lose the 1GB hugepage...
>
> Ditto here re: introducing another regression.
>
> > To me, the best solution is 3 as it would prepare for other similar
> > issues later, it is similar to x86 and it allows us to keep 1GB
> > hugepages.
> >
> > I have been thinking, and to me nomap does not provide anything since
> > the kernel should not address this memory range, so if it does, we
> > must fix the kernel.
> >
> > Let me know what you all think, I'll be preparing a PoC of 3 in the meantime!
>
> #3 would probably get my vote too. It seems like you could use it
> dynamically if there was to be a future other provider of "mmode_resv"
> regions, rather than doing something location-specific.
>
> We should probably document these opensbi reserved memory nodes though
> in a dt-binding or w/e if we are going to be relying on them to not
> crash!
Yes, you're right, let's see what Atish and Anup think!
Thanks for your quick answers Conor and Song, really appreciated!
Alex
>
> Thanks for working on this,
> Conor.
>
Powered by blists - more mailing lists