[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <68d6df3f410de_1052010059@dwillia2-mobl4.notmuch>
Date: Fri, 26 Sep 2025 11:45:19 -0700
From: <dan.j.williams@...el.com>
To: Michał Cłapiński <mclapinski@...gle.com>,
<dan.j.williams@...el.com>
CC: Mike Rapoport <rppt@...nel.org>, Ira Weiny <ira.weiny@...el.com>, "Dave
Jiang" <dave.jiang@...el.com>, Vishal Verma <vishal.l.verma@...el.com>,
<jane.chu@...cle.com>, Pasha Tatashin <pasha.tatashin@...een.com>, "Tyler
Hicks" <code@...icks.com>, <linux-kernel@...r.kernel.org>,
<nvdimm@...ts.linux.dev>
Subject: Re: [PATCH 1/1] nvdimm: allow exposing RAM carveouts as NVDIMM DIMM
devices
Michał Cłapiński wrote:
[..]
> > As Mike says you would lose 128K at the end, but that indeed becomes
> > losing that 1GB given alignment constraints.
> >
> > However, I think that could be solved by just separately vmalloc'ing the
> > label space for this. Then instead of kernel parameters to sub-divide a
> > region, you just have an initramfs script to do the same.
> >
> > Does that meet your needs?
>
> Sorry, I'm having trouble imagining this.
> If I wanted 500 1GB chunks, I would request a region of 500GB+space
> for the label? Or is that a label and info-blocks?
You would specify an memmap= range of 500GB+128K*.
Force attach that range to Mike's RAMDAX driver.
[ modprobe -r nd_e820, don't build nd_820, or modprobe policy blocks nd_e820 ]
echo ramdax > /sys/bus/platform/devices/e820_pmem/driver_override
echo e820_pmem > /sys/bus/platform/drivers/ramdax
* forget what I said about vmalloc() previously, not needed
> Then on each boot the kernel would check if there is an actual
> label/info-blocks in that space and if yes, it would recreate my
> devices (including the fsdax/devdax type)?
Right, if that range is persistent the kernel would automatically parse
the label space each boot and divide up the 500GB region space into
namespaces.
128K of label spaces gives you 509 potential namespaces.
> One of the requirements for live update is that the kexec reboot has
> to be fast. My solution introduced a delay of tens of milliseconds
> since the actual device creation is asynchronous. Manually dividing a
> region into thousands of devices from userspace would be very slow but
Wait, 500GB Region / 1GB Namespace = thousands of Namespaces?
> I would have to do that only on the first boot, right?
Yes, the expectation is only incur that overhead once. It also allows
for VMs to be able to lookup their capacity by name. So you do not need
a separate mapping of 1GB Namepsace blocks to VMs. Just give some VMs
bigger Namespaces than others by name.
Powered by blists - more mailing lists