[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6806d2d6f2aed_71fe294ed@dwillia2-xfh.jf.intel.com.notmuch>
Date: Mon, 21 Apr 2025 16:20:55 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Michal Clapinski <mclapinski@...gle.com>, Pasha Tatashin
<pasha.tatashin@...een.com>, Dan Williams <dan.j.williams@...el.com>, "Vishal
Verma" <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>, "Ira
Weiny" <ira.weiny@...el.com>, Jonathan Corbet <corbet@....net>
CC: <nvdimm@...ts.linux.dev>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Michal Clapinski <mclapinski@...gle.com>
Subject: Re: [PATCH v2 1/1] libnvdimm/e820: Add a new parameter to configure
many regions per e820 entry
Michal Clapinski wrote:
> Currently, the user has to specify each memory region to be used with
> nvdimm via the memmap parameter. Due to the character limit of the
> command line, this makes it impossible to have a lot of pmem devices.
> This new parameter solves this issue by allowing users to divide
> one e820 entry into many nvdimm regions.
>
> This change is needed for the hypervisor live update. VMs' memory will
> be backed by those emulated pmem devices. To support various VM shapes
> I want to create devdax devices at 1GB granularity similar to hugetlb.
This looks fairly straightforward, but if this moves forward I would
explicitly call the parameter something like "split" instead of "pmem"
to align it better with its usage.
However, while this is expedient I wonder if you would be better
served with ACPI table injection to get more control and configuration
options...
> It's also possible to expand this parameter in the future,
> e.g. to specify the type of the device (fsdax/devdax).
...for example, if you injected or customized your BIOS to supply an
ACPI NFIT table you could get to deeper degrees of customization without
wrestling with command lines. Supply an ACPI NFIT that carves up a large
memory-type range into an aribtrary number of regions. In the NFIT there
is a natural place to specify whether the range gets sent to PMEM. See
call to nvdimm_pmem_region_create() near NFIT_SPA_PM in
acpi_nfit_register_region()", and "simply" pick a new guid to signify
direct routing to device-dax. I say simply, but that implies new ACPI
NFIT driver plumbing for the new mode.
Another overlooked detail about NFIT is that there is an opportunity to
determine cases where the platform might have changed the physical
address map from one boot to the next. In other words, I cringe at the
fragility of memmap=, but I understand that it has the benefit of being
simple. See the "nd_set cookie" concept in
acpi_nfit_init_interleave_set().
Powered by blists - more mailing lists