[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260205174847.000065a4@huawei.com>
Date: Thu, 5 Feb 2026 17:48:47 +0000
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Gregory Price <gourry@...rry.net>
CC: Ira Weiny <ira.weiny@...el.com>, Dave Jiang <dave.jiang@...el.com>, "Fan
Ni" <fan.ni@...sung.com>, Dan Williams <dan.j.williams@...el.com>, "Davidlohr
Bueso" <dave@...olabs.net>, Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>, <linux-cxl@...r.kernel.org>,
<nvdimm@...ts.linux.dev>, <linux-kernel@...r.kernel.org>, Li Ming
<ming.li@...omail.com>, Alireza Sanaee <alireza.sanaee@...wei.com>
Subject: Re: [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices
(DCD)
> > I'm not clear if sysram could be used for virtio, or even needed. I'm
> > still figuring out how virtio of simple memory devices is a gain.
> >
>
> Jonathan mentioned that he thinks it would be possible to just bring it
> online as a private-node and inform the consumer of this. I think
> that's probably reasonable.
Firstly VM == Application. If we have say a DB that wants to do everything
itself, it would use same interface as a VM to get the whole memory
on offer. (I'm still trying to get that Application Specific Memory term
adopted ;)
This would be better if we didn't assume anything to do with virtio
- that's just one option (and right now for CXL mem probably not the
sensible one as it's missing too many things we get for free by just
emulating CXL devices - e.g. all the stuff you are describing here
for the host is just as valid in the guest.) We have a path to
get that emulation and should have the big missing piece posted shortly
(DCD backed by 'things - this discussion' that turn up after VM boot).
The real topic is memory for a VM and we need a way to tie a memory
backend in qemu to, so that whatever the fabric manager provided for
that VM is given to the VM and not used for anything else.
If it's for a specific VM, then it's tagged as otherwise how else
do we know the intent? (lets ignore random other out of band paths).
Layering wise we can surface as many backing sources as we like at
runtime via 1+ emulated DCD devices (to give perf information etc).
They each show up in the guest as contiguous (maybe tagged) single
extent and then we apply what ever comes out of the rest of this
discussion on top of that.
So all we care about is how the host presents it.
Bunch of things might work for this.
1. Just put it in a numa node that requires specific selection to allocate
from. This is nice because it just looks like normal memory and we
can apply any type of front end on top of that. Not good if we have a lot
of these coming and going.
2. Provide it as something with an fd we can memmap. I was fine with Dax for
this but if it's normal ram just for a VM anything that gives me a handle
that I can memmap is fine. Just need a way to know which one (so tag).
It's pretty similar for shared cases. Just need a handle to memmap.
In that case, tag goes straight up to guest OS (we've just unwound the
extent ordering in the host and presented it as a contiguous single
extent).
Assumption here is we always provide all that capacity that was tagged
for the VM to use to the VM. Things may get more entertaining if we have
a bunch of capacity that was tagged to provide extra space for a set of
VMs (e.g. we overcommit on top of the DCD extents) - to me that's a
job for another day.
So I'm not really envisioning anything special for the VM case, it's
just a dedicate allocation of memory for a user who knows how to get it.
We will want a way to get perf info though so we can provide that
in the VM. Maybe can figure that out from the CXL HW backing it without
needing anything special in what is being discussed here.
Jonathan
>
> ~Gregory
Powered by blists - more mailing lists