[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAi7L5esz-vxbbP-4ay-cCfc1osXLkvGDx5thijuBXFBQNwiug@mail.gmail.com>
Date: Fri, 26 Sep 2025 14:47:50 +0200
From: Michał Cłapiński <mclapinski@...gle.com>
To: dan.j.williams@...el.com
Cc: Mike Rapoport <rppt@...nel.org>, Ira Weiny <ira.weiny@...el.com>,
Dave Jiang <dave.jiang@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, jane.chu@...cle.com,
Pasha Tatashin <pasha.tatashin@...een.com>, Tyler Hicks <code@...icks.com>,
linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev
Subject: Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
On Wed, Sep 24, 2025 at 3:16 AM <dan.j.williams@...el.com> wrote:
>
> Michał Cłapiński wrote:
> > On Fri, Aug 29, 2025 at 9:57 AM Mike Rapoport <rppt@...nel.org> wrote:
> > >
> > > Hi Ira,
> > >
> > > On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote:
> > > > + Michal
> > > >
> > > > Mike Rapoport wrote:
> > > > > From: "Mike Rapoport (Microsoft)" <rppt@...nel.org>
> > > > >
> > > > > There are use cases, for example virtual machine hosts, that create
> > > > > "persistent" memory regions using memmap= option on x86 or dummy
> > > > > pmem-region device tree nodes on DT based systems.
> > > > >
> > > > > Both these options are inflexible because they create static regions and
> > > > > the layout of the "persistent" memory cannot be adjusted without reboot
> > > > > and sometimes they even require firmware update.
> > > > >
> > > > > Add a ramdax driver that allows creation of DIMM devices on top of
> > > > > E820_TYPE_PRAM regions and devicetree pmem-region nodes.
> > > >
> > > > While I recognize this driver and the e820 driver are mutually
> > > > exclusive[1][2]. I do wonder if the use cases are the same?
> > >
> > > They are mutually exclusive in the sense that they cannot be loaded
> > > together so I had this in Kconfig in RFC posting
> > >
> > > config RAMDAX
> > > tristate "Support persistent memory interfaces on RAM carveouts"
> > > depends on OF || (X86 && X86_PMEM_LEGACY=n)
> > >
> > > (somehow my rebase lost Makefile and Kconfig changes :( )
> > >
> > > As Pasha said in the other thread [1] the use-cases are different. My goal
> > > is to achieve flexibility in managing carved out "PMEM" regions and
> > > Michal's patches aim to optimize boot time by autoconfiguring multiple PMEM
> > > regions in the kernel without upcalls to ndctl.
> > >
> > > > From a high level I don't like the idea of adding kernel parameters. So
> > > > if this could solve Michal's problem I'm inclined to go this direction.
> > >
> > > I think it could help with optimizing the reboot times. On the first boot
> > > the PMEM is partitioned using ndctl and then the partitioning remains there
> > > so that on subsequent reboots kernel recreates dax devices without upcalls
> > > to userspace.
> >
> > Using this patch, if I want to divide 500GB of memory into 1GB chunks,
> > the last 128kB of every chunk would be taken by the label, right?
> >
> > My patch disables labels, so we can divide the memory into 1GB chunks
> > without any losses and they all remain aligned to the 1GB boundary. I
> > think this is necessary for vmemmap dax optimization.
>
> As Mike says you would lose 128K at the end, but that indeed becomes
> losing that 1GB given alignment constraints.
>
> However, I think that could be solved by just separately vmalloc'ing the
> label space for this. Then instead of kernel parameters to sub-divide a
> region, you just have an initramfs script to do the same.
>
> Does that meet your needs?
Sorry, I'm having trouble imagining this.
If I wanted 500 1GB chunks, I would request a region of 500GB+space
for the label? Or is that a label and info-blocks?
Then on each boot the kernel would check if there is an actual
label/info-blocks in that space and if yes, it would recreate my
devices (including the fsdax/devdax type)?
One of the requirements for live update is that the kexec reboot has
to be fast. My solution introduced a delay of tens of milliseconds
since the actual device creation is asynchronous. Manually dividing a
region into thousands of devices from userspace would be very slow but
I would have to do that only on the first boot, right?
Powered by blists - more mailing lists