[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67b5421cd91d2_2d2c294ad@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 18 Feb 2025 18:29:48 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Alejandro Lucero Palau <alucerop@....com>, Dan Williams
<dan.j.williams@...el.com>, <linux-cxl@...r.kernel.org>,
<netdev@...r.kernel.org>, <edward.cree@....com>, <davem@...emloft.net>,
<kuba@...nel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>,
<dave.jiang@...el.com>
Subject: Re: [PATCH v10 01/26] cxl: make memdev creation type agnostic
Alejandro Lucero Palau wrote:
>
> On 2/6/25 19:37, Dan Williams wrote:
> > alucerop@ wrote:
> >> From: Alejandro Lucero <alucerop@....com>
> >>
> >> In preparation for Type2 support, change memdev creation making
> >> type based on argument.
> >>
> >> Integrate initialization of dvsec and serial fields in the related
> >> cxl_dev_state within same function creating the memdev.
> >>
> >> Move the code from mbox file to memdev file.
> >>
> >> Add new header files with type2 required definitions for memdev
> >> state creation.
> >>
> >> Signed-off-by: Alejandro Lucero <alucerop@....com>
> >> ---
> >> drivers/cxl/core/mbox.c | 20 --------------------
> >> drivers/cxl/core/memdev.c | 23 +++++++++++++++++++++++
> >> drivers/cxl/cxlmem.h | 18 +++---------------
> >> drivers/cxl/cxlpci.h | 17 +----------------
> >> drivers/cxl/pci.c | 16 +++++++++-------
> >> include/cxl/cxl.h | 26 ++++++++++++++++++++++++++
> >> include/cxl/pci.h | 23 +++++++++++++++++++++++
> >> 7 files changed, 85 insertions(+), 58 deletions(-)
> >> create mode 100644 include/cxl/cxl.h
> >> create mode 100644 include/cxl/pci.h
> >>
> >> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> >> index 4d22bb731177..96155b8af535 100644
> >> --- a/drivers/cxl/core/mbox.c
> >> +++ b/drivers/cxl/core/mbox.c
> >> @@ -1435,26 +1435,6 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
> >> }
> >> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
> >>
> >> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> >> -{
> >> - struct cxl_memdev_state *mds;
> >> -
> >> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
> >> - if (!mds) {
> >> - dev_err(dev, "No memory available\n");
> >> - return ERR_PTR(-ENOMEM);
> >> - }
> >> -
> >> - mutex_init(&mds->event.log_lock);
> >> - mds->cxlds.dev = dev;
> >> - mds->cxlds.reg_map.host = dev;
> >> - mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> >> - mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> >> -
> >> - return mds;
> >> -}
> >> -EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
> >> -
> >> void __init cxl_mbox_init(void)
> >> {
> >> struct dentry *mbox_debugfs;
> >> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> >> index 63c6c681125d..456d505f1bc8 100644
> >> --- a/drivers/cxl/core/memdev.c
> >> +++ b/drivers/cxl/core/memdev.c
> >> @@ -632,6 +632,29 @@ static void detach_memdev(struct work_struct *work)
> >>
> >> static struct lock_class_key cxl_memdev_key;
> >>
> >> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
> >> + u16 dvsec, enum cxl_devtype type)
> >> +{
> >> + struct cxl_memdev_state *mds;
> >> +
> >> + mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
> >> + if (!mds) {
> >> + dev_err(dev, "No memory available\n");
> >> + return ERR_PTR(-ENOMEM);
> >> + }
> >> +
> >> + mutex_init(&mds->event.log_lock);
> >> + mds->cxlds.dev = dev;
> >> + mds->cxlds.reg_map.host = dev;
> >> + mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> >> + mds->cxlds.cxl_dvsec = dvsec;
> >> + mds->cxlds.serial = serial;
> >> + mds->cxlds.type = type;
> >> +
> >> + return mds;
> >> +}
> >> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
> > I was envisioning that accelerators only consider 'struct cxl_dev_state'
> > and that 'struct cxl_memdev_state' is exclusively for
> > CXL_DEVTYPE_CLASSMEM memory expander use case.
>
>
> That was the original idea and what I have followed since the RFC, but
> since the patchset has gone through some assumptions which turned wrong,
> I seized the "revolution" for changing this as well.
>
>
> A type2 is a memdev, and what makes it different is the exposure, so I
> can not see why an accel driver, at least a Type2, should not use a
> cxl_memdev_state struct. This simplifies the type2 support and after
> all, a Type2 could require the exact same things like a type3, like
> mbox, perf, poison, ... .
I disagree, I think it avoids the discipline of maintaining Accelerator
CXL.mem infrastructure alongside the sometimes super-set sometimes
disjoint-set of generic CXL memory expander support.
Specifically, the reason I think the implementation is worse off reusing
cxl_memdev_state for accelerators is because accelerators are not
subject to the same requirements as "memory device" expanders that emit
the class code from CXL 3.1 8.1.12.1 "PCI Header - Class Code Register
(Offset 09h)".
The whole point of the CXL_DEVTYPE_CLASSMEM vs CXL_DEVTYPE_DEVMEM
distinction was for cases where accelerators are not mandated to support
the same functionality as a generic expander.
It is not until patch12 that this set notices that to_cxl_memdev_state()
has been returning NULL for accelerator created 'cxl_memdev_state'
instances. However, patch12 is confused because to_cxl_memdev_state()
has no reason to exist if it can be assumed that 'struct
cxl_memdev_state' always wraps 'struct cxl_dev_state'.
The fact that it requires thought and care to decide how accelerators
can share code paths with the generic memory class device case is a
*feature*.
So either 'enum cxl_devtype' needs to be removed altogether (would need
a strong argument that is currently absent from this set), or we need to
carefully think about the optional functionality that an accelerator may
want to reuse from expanders. As it stands, most of cxl_memdev_state
makes little sense for an accelerator:
> struct cxl_memdev_state {
> struct cxl_dev_state cxlds;
> size_t lsa_size;
Why would an accelerator ever worry about PMEM?
> char firmware_version[0x10];
Certainly accelerators have their own firmware versioning and update
flows independent of the CXL standard?
> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
> DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
Mailbox is not mandatory and Memory device mandatory commands are not
mandatory for accelerators?
> u64 total_bytes;
> u64 volatile_only_bytes;
> u64 persistent_only_bytes;
> u64 partition_align_bytes;
> u64 active_volatile_bytes;
> u64 active_persistent_bytes;
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
Already had a long discussion about accelerator DPA space enumeration.
> struct cxl_event_state event;
> struct cxl_poison_state poison;
These might be candidates for accelerator reuse, but I would suggest
promoting them out of 'struct cxl_memdev_state' to an optional
capability of 'struct cxl_dev_state'.
> struct cxl_security_state security;
PMEM Security is 2-degrees of optionality away from what an accelerator
would ever need to consider.
> struct cxl_fw_state fw;
Not even sure that cxl_memdev_state needs the ops passed to
firmware_upload_register() make little use of 'struct cxl_memdev_state'
outside of using it to look up the 'struct cxl_mailbox'.
> };
> > Something roughly like
> > the below. Note, this borrows from the fwctl_alloc_device() example
> > which captures the spirit of registering a c4ore object wrapped by an end
> > driver provided structure).
> >
> > #define cxl_dev_state_create(parent, serial, dvsec, type, drv_struct, member) \
> > ({ \
> > static_assert(__same_type(struct cxl_dev_state, \
> > ((drv_struct *)NULL)->member)); \
> > static_assert(offsetof(drv_struct, member) == 0); \
> > (drv_struct *)_cxl_dev_state_create(parent, serial, dvsec, \
> > type, sizeof(drv_struct)); \
> > })
>
>
> If you prefer the accel driver keeping a struct embedding the core cxl
> object, that is fine. I can not see a reason for not doing it, although
> I can not see a reason either for imposing this.
The cxl_pci driver would use it to align on a single way to wrap its class device
driver context around 'struct cxl_dev_state'. So it is more about
setting common expectations across cxl_pci and accelerator drivers for
how they wrap 'struct cxl_dev_state'.
[..]
> > struct cxl_memdev_state *cxl_memdev_state_create(parent, serial, dvsec)
> > {
> > struct cxl_memdev_state *mds = cxl_dev_state_create(
> > parent, serial, dvsec, CXL_DEVTYPE_CLASSMEM,
> > struct cxl_memdev_state, cxlds);
> >
> > if (IS_ERR(mds))
> > return mds;
> >
> > mutex_init(&mds->event.log_lock);
> > mds->cxlds.dev = dev;
> > mds->cxlds.reg_map.host = dev;
> > mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> > mds->cxlds.cxl_dvsec = dvsec;
> > mds->cxlds.serial = serial;
> > mds->cxlds.type = type;
> >
> > return mds;
> > }
> >
> > If an accelerator wants to share infrastructure that is currently housed
> > in 'struct cxl_memdev_state', then that functionality should first move
> > to 'struct cxl_dev_state'.
>
> If you see the full patchset, you will realize the accel driver does not
> use the cxl objects except for doing further initialization with them.
> In other words, we keep the idea of avoiding the accel driver
> manipulating those structs freely, and having an API for accel drivers.
> Using cxl_memdev_struct now implies to change some core functions for
> using that struct as the argument what should not be a problem.
To date, 'struct cxl_memdev_state' has been a dumping ground for random
context that the class driver needs. The consequences of that dumping
have been low as the only potential burden would be self-contained to
the only user of 'struct cxl_memdev_state', cxl_pci. The creation of
'struct cxl_dev_state' was motivated by the observation that the arrival
of accelerators ends that honeymoon period. The implementation needs to
be conscientious about not spreading cruft amongst multiple disparate
accelerator drivers and their use cases.
The cxl_pci class device should be able to change the cxl_memdev_state
structure at will without worry or care for understanding accelerator
use cases.
> We will need to think about this when type2 cache support comes, which
> will mean type1 support as well. But it is hard to think about what will
> be required then, because it is not clear yet how we should implement
> that support. So for the impending need of having Type2 support for
> CXL.mem, I really think this is all what we need by now.
I want to avoid the liability of accelerator drivers silently growing
attachment to functionality in 'struct cxl_memdev_state', and architect
a shared data structure / library interface for those pieces to reuse.
Powered by blists - more mailing lists