lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6983888e76bcc_58e211005e@iweiny-mobl.notmuch>
Date: Wed, 4 Feb 2026 11:57:34 -0600
From: Ira Weiny <ira.weiny@...el.com>
To: Gregory Price <gourry@...rry.net>, Ira Weiny <ira.weiny@...el.com>
CC: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>, "Jonathan
 Cameron" <Jonathan.Cameron@...wei.com>, Dan Williams
	<dan.j.williams@...el.com>, Davidlohr Bueso <dave@...olabs.net>, "Alison
 Schofield" <alison.schofield@...el.com>, Vishal Verma
	<vishal.l.verma@...el.com>, <linux-cxl@...r.kernel.org>,
	<nvdimm@...ts.linux.dev>, <linux-kernel@...r.kernel.org>, Li Ming
	<ming.li@...omail.com>
Subject: Re: [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices
 (DCD)

Gregory Price wrote:
> On Tue, Feb 03, 2026 at 04:04:23PM -0600, Ira Weiny wrote:
> > Gregory Price wrote:
>
> ... snipping this to the top ...
> > Again I don't like the idea of needing new drivers for new policies.  That
> > goes against how things should work in the kernel.
> 
> If you define "How should virtio consume an extent" and "How should
> FAMFS consume an extent" as "Policy" I can see your argument, and we
> should address this.

TLDR; I just don't want to see an explosion of 'drivers' for various
'policies'.  I think your use of the word 'policy' triggered me.

> 
> I view "All things shall route through DAX" as "A policy" that
> dictates cxl-driven changes to dax - including new dax drivers
> (see: famfs new dax mechanism).
> 
> So we're already there.  Might as well reduce the complexity (as
> explained below) and cut out dax where it makes sense rather than
> force everyone to eat DAX (for potentially negative value).
> 
> ---
> 
> > > has been a concern, in favor of a per-region-driver policy on how to
> > > manage hot-add/remove events.
> > 
> > I think a concern would be that each region driver is implementing a
> > 'policy' which requires new drivers for new policies.
> > 
> 
> This is fair, we don't want infinite drivers - and many use cases
> (we imagine) will end up using DAX - I'm not arguing to get rid of the
> dax driver.
> 
> There are at least 3 or 4 use-cases i've seen so far
> 
> - dax (dev and fs): can share a driver w/ DAXDRV_ selection

Legacy...  check!

> 
> - sysram : preferably doing direct hotplug - not via dax
>            private-ram may re-use this cleanly with some config bits

Pre-reading this entire email I think what I was thinking was bundling a
lot of this in here.  Put knobs here to control 'policy' not add to this
list for more policies.

> 
> - virtio : may not even want to expose objects to userland
>            may prefer to simply directly interact with a VMM

Even if directly interacting with the VMM there has to be controls
directly by user space to control this.  I'm not a virtio expert so...  Ok
lets just say there is another flow here.  Don't call it a policy though.

> 	   dax may present a security issue if reconfig'd to device

I don't understand this comment.

> 
> - type-2 : may have wildly different patterns and preferences
>            may also end up somewhat generalized

I think this is all going to be handled in the specific drivers of the
specific devices.  There is no policy here other than 'special' for the
device and we can't control that.

> 
> I think trying to pump all of these through dax and into userland by
> default is a mistake - if only because it drives more complexity.

I don't want to preserve DAX.  I don't.

So I think this list is fine.

> 
> We should get form from function.
> 
> Example: for sysram - dax_kmem is just glue, the hotplug logic should
>          live in cxl and operate directly on extents.  It's simpler and
> 	 doesn't add a bunch of needless dependencies.

Agreed.

> 
> Consider a hot-unplug request
> 
> Current setup
> ----
> FM -> Host
>    1) Unplug Extent A
> Host
>    2) cxl: hotunplug(dax_map[A])
>    3) dax: Does this cover the entire dax? (no->reject, yes->unplug())
>       - might fail due to dax-reasons
>       - might fail due to normal hot-unplug reasons
>    4) unbind dax
>    5) return extent
> 
> Dropping Dax in favor of sysram doing direct hotplug
> ----
> FM -> Host
>    1) Unplug Extent A 
> Host
>    2) hotunplug(extents_map[A])
>       - might fail because of normal hot-unplug reasons
>    3) return extent

Agreed.

> 
> It's just simpler and gives you the option of complete sparseness
> (untagged extents) or tracking related extents (tagged extents).

Just add the knobs for the tags and yea...  the policy of how to handle
the extents can then be controlled by user space.

> 
> This pattern may not carry over the same with dax or virtio uses.

I don't fully understand the virtio case.  So I'll defer this.  But I feel
like this is not so much of a new policy as a different path which is, as
you said above, potentially not in user space at all.

> 
> > I did not like the 'implicit' nature of the association of dax device with
> > extent.  But it maintained backwards compatibility with non-sparse
> > regions...
> > 
> > My vision for tags was that eventually dax device creation could have a
> > tag specified prior and would only allocate from extents with that tag.
> >
> 
> yeah i think it's pretty clear the dax case wants a daxN.M/uuid of some
> kind (we can argue whether it needs to be exposed to userland - but
> having some conversations about FAMFS, this sounds userful.
> 
> > I'm not following this.  If set(A) arrives can another set(A) arrive
> > later?
> > 
> > How long does the kernel wait for all the 'A's to arrive?  Or must they be
> > in a ...  'more bit set' set of extents.
> > 
> 
> Set(A) = extents that arrive together with the more bit set
> 
> So lets say you get two sets that arrive with the same tag (A)
> Set(A) + Set(A)'
> 
> Set(A)' would get rejected because Set(A) has already arrived.
> Otherwise, accepting Set(A)' implies sparseness of Set(A).
> 
> Having a tag map to a region is pointless - the HPA maps extent to
> region.  So there's no other use for a tag in the sysram case.
> 
> On the flip side - assuming you want to try to allow Set(A)+Set(A)'
> 
> How userland is expected to know when all extents have arrived if
> hotplug cannot occur until all the extents have arrived, and the only
> place to put those extents is DAX?  Seems needlessly complex.

Ok I think we need to sync up on the driver here.

For FAMFS/famdax they can expect the more bit and all that jazz.  I can't
stop that.

But for sysram.  No.  It is easy enough to assign a tag to the region and
any extent which shows up without that tag (be it NULL tag or tag A) gets
rejected.  All valid tagged extents get hot plugged.

Simple.  Easy policy for user space to control.

> 
> > Regardless IMO if user space was monitoring the extents with tag A they
> > can decide if and when all those extents have arrived and can build on top
> > of that.
> > 
> 
> This assumes userland has something to build on top of, and moreover
> that this something will be DAX.
> 
> - I agree for a filesystem-consumption pattern.
> - I disagree for hotplug - dax is pointless glue.
> - I don't know if DAX is right-fit for other use cases. (it might just
>   want to pass the raw IORESOURCE region to the VMM, for example).
> 
> > Are we expecting to have tags and non-taged extents on the same DCD
> > region?
> > 
> > I'm ok not supporting that.  But just to be clear about what you are
> > suggesting.
> > 
> 
> Probably not.  And in fact I think that should be one configuration bit
> (either you support tags or you don't - reject the other state).

Not bit.  Just a non-null uuid set.

> 
> But I can imagine a driver wanting to support either (exclusive-or)

Yes.  Set the uuid.

> 
> > Would the cxl_sysram region driver be attached to the DCD partition?  Then
> > it would have some DCD functionality built in...  I guess make a common
> > extent processing lib for the 2 drivers?
> > 
> 
> Same driver - allow it to bind PARTMODE_RAM or PARTMODE_DC.

ok good.

> 
> A RAM region hotplugs exactly once: at bind/unbind
> A DC region hotplugs at runtime.

Yes for every extent as they are seen.

> 
> Same code, DC just adds the log monitoring stuff.

Yep.

> 
> > I feel like that is a lot of policy being built into the kernel.  Where
> > having the DCD region driver simply tell user space 'Hey there is a new
> > extent here' and then having user space online that as sysram makes the
> > policy decision in user space.
> > 
> > Segwaying into the N_PRIVATE work.  Couldn't we assign that memory to a
> > NUMA node with N_PRIVATE only memory via userspace...  Then it is onlined
> > in a way that any app which is allocating from that node would get that
> > memory.  And keep it out of kernel space?
> > 
> > But keep all that policy in user space when an extent appears.  Not baked
> > into a particular driver.
> > 
> 
> I would need to think this over a bit more, I'm not quite seeing how
> what you are suggesting would work.

I think you set it out above.  I thought the sysram driver would have a
control for N_MEMORY_PRIVATE vs N_MEMORY which could control that policy
during hotplug.  Maybe I'm hallucinating.

> 
> N_MEMORY_PRIVATE implies there is some special feature of the device
> that should be taken into account when managing the memory - but that
> you want to re-use (some of) the existing mm/ infrastructure for basic
> operations (page_alloc, reclaim, migration, etc).
> 
> There's an argument that some such nodes shouldn't even be visible to
> userspace (of what use is knowing a node is there if mempolicy commands
> are rejected or ignored if you try to bind to it?)
> 
> But also, setting N_MEMORY_PRIVATE vs N_MEMORY would explicitly be an
> mm/memory_hotplug.c operation - so there's a pretty long path from
> userland to "Setting N_MEMORY_PRIVATE" that goes through the drivers.
> 
> You can't set N_MEMORY_PRIVATE before going online (has to be done
> during the hotplug process, otherwise you get nasty race conditions).
> 
> > > But I think this resolves a lot of the disparate disagreements on "what
> > > to do with tags" and how to manage sparseness - just split the policy
> > > into each individual use-case's respective driver.
> > 
> > I think what I'm worried about is where that policy resides.
> >
> > I think it is best to have a DCD region driver which simply exposes
> > extents and allows user space to control how those extents are used.  I
> > think some of what you have above works like that but I want to be careful
> > baking in policy.
> > 
> 
> I guess summarizing the sysram case: The policy seems simple enough to
> not warrant over-complicated the infrastructure for the sake of making
> dax "The One Interface To Rule Them All".
> 
> All userland wants to do for sysram is hot(un)plug.  Why bother with
> dax at all?

I did not want dax.  Was not advocating for dax.  Just did not want to
build a bunch of new 'drivers' for new each new policy.

Summary, it is fine to add new knobs to the sysram driver for new policy
controls.  It is _not_ ok to have to put in a new driver.

I'm not clear if sysram could be used for virtio, or even needed.  I'm
still figuring out how virtio of simple memory devices is a gain.

Ira

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ