[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YtB0loLvlKbgcA43@iweiny-desk3>
Date: Thu, 14 Jul 2022 12:55:02 -0700
From: Ira Weiny <ira.weiny@...el.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: Bjorn Helgaas <bhelgaas@...gle.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Lukas Wunner <lukas@...ner.de>,
"Alison Schofield" <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Ben Widawsky <bwidawsk@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<linux-pci@...r.kernel.org>
Subject: Re: [PATCH V13 6/9] cxl/port: Read CDAT table
On Thu, Jul 14, 2022 at 08:38:29AM -0700, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@...el.com>
[snip]
> > ---
> > drivers/cxl/cdat.h | 100 ++++++++++++++++++++++++
> > drivers/cxl/core/pci.c | 170 +++++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/cxl.h | 5 ++
> > drivers/cxl/cxlpci.h | 1 +
> > drivers/cxl/port.c | 51 +++++++++++++
> > 5 files changed, 327 insertions(+)
> > create mode 100644 drivers/cxl/cdat.h
>
> The bin_attr additions to CXL sysfs need to be added to
> Documentation/ABI/testing/sysfs-bus-cxl.
Done. But with a very simple text:
What: /sys/bus/cxl/devices/endpointX/CDAT/cdat
Date: July, 2022
KernelVersion: v5.19
Contact: linux-cxl@...r.kernel.org
Description:
(RO) Raw binary CDAT data.
Is this enough?
[snip]
> > +
> > +/* Device Scoped Memory Side Cache Information Structure */
> > +#define CDAT_DSMSCIS_DW1_HANDLE 0x000000ff
> > +#define CDAT_DSMSCIS_MEMORY_SIDE_CACHE_SIZE(entry) \
> > + ((u64)((entry)[3]) << 32 | (entry)[2])
>
> This looks sketchy for 2 reasons, there is no type safety on @entry, so
> it could be anything, and it is referenced twice which is a bug waiting
> to happen if this or any of the other macros that copy this pattern pass
> a statement argument with side-effects like:
> CDAT_DSMSCIS_MEMORY_SIDE_CACHE_SIZE(entry++).
>
> > +#define CDAT_DSMSCIS_DW4_MEMORY_SIDE_CACHE_ATTRS 0xffffffff
> > +
> > +/* Device Scoped Initiator Structure */
> > +#define CDAT_DSIS_DW1_FLAGS 0x000000ff
> > +#define CDAT_DSIS_DW1_HANDLE 0x0000ff00
> > +
> > +/* Device Scoped EFI Memory Type Structure */
> > +#define CDAT_DSEMTS_DW1_HANDLE 0x000000ff
> > +#define CDAT_DSEMTS_DW1_EFI_MEMORY_TYPE_ATTR 0x0000ff00
> > +#define CDAT_DSEMTS_DPA_OFFSET(entry) ((u64)((entry)[3]) << 32 | (entry)[2])
> > +#define CDAT_DSEMTS_DPA_LENGTH(entry) ((u64)((entry)[5]) << 32 | (entry)[4])
>
> More sketchy double array derefernces against an unspecified type. No
> need to invent a new way to parse ACPI-like table data. CDAT parsing
> should end up looking a lot like: drivers/acpi/numa/hmat.c. I.e. I would
> expect the helpers in drivers/acpi/tables.c are repurposed to parse a
> passed in CDAT data buffer rather than using acpi_get_table()
> internally.
>
> Lets drop patch 9 for now and all of these definitions. Leave all the
> parsing to userspace via the bin_attr. Then we can circle back and build
> up the CDAT parsing code when its ready to consume the data for QTG
> assignment, and reuse the ACPI parsing code.
>
> > +
> > +/* Switch Scoped Latency and Bandwidth Information Structure */
> > +#define CDAT_SSLBIS_DW1_DATA_TYPE 0x000000ff
> > +#define CDAT_SSLBIS_BASE_UNIT(entry) ((u64)((entry)[3]) << 32 | (entry)[2])
> > +#define CDAT_SSLBIS_ENTRY_PORT_X(entry, i) ((entry)[4 + (i) * 2] & 0x0000ffff)
> > +#define CDAT_SSLBIS_ENTRY_PORT_Y(entry, i) (((entry)[4 + (i) * 2] & 0xffff0000) >> 16)
> > +#define CDAT_SSLBIS_ENTRY_LAT_OR_BW(entry, i) ((entry)[4 + (i) * 2 + 1] & 0x0000ffff)
>
> Yeah, all of these unused CDAT defintions can be deleted from this
> patch.
>
Done and patch 9 dropped.
> > +
> > +#define CXL_DOE_PROTOCOL_TABLE_ACCESS 2
> > +
> > +/**
> > + * struct cxl_cdat - CXL CDAT data
> > + *
> > + * @table: cache of CDAT table
> > + * @length: length of cached CDAT table
> > + */
> > +struct cxl_cdat {
> > + void *table;
> > + size_t length;
> > +};
> > +
> > +#endif /* !__CXL_CDAT_H__ */
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index c4c99ff7b55e..9232b806d051 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -4,10 +4,12 @@
> > #include <linux/device.h>
> > #include <linux/delay.h>
> > #include <linux/pci.h>
> > +#include <linux/pci-doe.h>
> > #include <cxlpci.h>
> > #include <cxlmem.h>
> > #include <cxl.h>
> > #include "core.h"
> > +#include "cdat.h"
> >
> > /**
> > * DOC: cxl core pci
> > @@ -458,3 +460,171 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > return 0;
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
> > +
> > +static struct pci_doe_mb *find_cdat_mb(struct device *uport)
>
> small naming nit, probably more places than this, but
> s/find_cdat_mb/find_cdat_doe/ to make it clearly distinct from the
> drivers/cxl/core/mbox.c infrastructure. I.e. I think "mailbox" is
> implied by "doe".
Good idea, mailbox (and mb) is overloaded.
>
> > +{
> > + struct cxl_memdev *cxlmd;
> > + struct cxl_dev_state *cxlds;
> > + unsigned long index;
> > + void *entry;
> > +
> > + if (!is_cxl_memdev(uport))
> > + return NULL;
>
> If this only supports endpoints just call this from the part of
> cxl_port_probe() that does:
>
> if (is_cxl_endpoint(port)) {
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> ...
>
> ...and let the person who adds CDAT support for switches refactor it
> later. I.e. find_cdat_doe() can just take a @cxlmd arg directly.
>
I was trying to make this generic for all ports originally. But in the end we
punted and said end points only since we don't have the plumbing for PCI port
driver yet. Won't I have to move this back when that comes online?
>
> It's
> enough that this patch located the cdat cached buffer on 'struct
> cxl_port' and the 'CDAT' sysfs attribute under
> /sys/bus/cxl/device/cxl_portX.
I'm not parsing this sentence?
.../cxl_portX/...
does not exist?
root fedora /sys/bus/cxl/drivers
12:46:39 > ll
total 0
drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_mem
drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm
drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm_bridge
drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_port
endpointX points back to the same place as /sys/bus/cxl/devices/endpointX
root fedora /sys/bus/cxl/drivers/cxl_port
12:47:03 > ll
total 0
...
lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
...
root fedora /sys/bus/cxl/devices
12:47:59 > ll
total 0
...
lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
So what is different from what I'm doing?
>
> > +
> > + cxlmd = to_cxl_memdev(uport);
> > + cxlds = cxlmd->cxlds;
> > +
> > + xa_for_each(&cxlds->doe_mbs, index, entry) {
> > + struct pci_doe_mb *cur = entry;
> > +
> > + if (pci_doe_supports_prot(cur, PCI_DVSEC_VENDOR_ID_CXL,
> > + CXL_DOE_PROTOCOL_TABLE_ACCESS))
> > + return cur;
>
> This has me thinking the doe_mbs xarray is overkill if all the other
> DOEs are just enumerated and then ignored. When / if more DOEs need to
> be handled then we can think about instantiating all of them and keeping
> them aruond. Otherwise, just do this pci_doe_supports_prot() check in
> cxl_pci and throw away the rest. Then cxlds would just carry a single
> @cdat_doe attribute for this first use of DOE capabilities in the
> kernel.
I was hoping that code could be lifted to the PCI side later. For now CXL
could handle all DOE's.
At this point CXL devices are the only one using DOE in the kernel. I know
there are other PCI uses coming but for any 1 device only one mailbox object
should be created or we are going to have the kernel drivers conflicting with
each other.
So I'd rather leave this series with CXL controlling the DOE mailboxes. I
think later we will need to lift the mailbox objects to PCI and CXL drivers
will need to query into that side as needed. There was a version like this
before (Probably Jonathan's) and I think the aux bus moved it all over here.
<sigh> Sorry getting pretty confused myself at this point.
The code change you suggest is easy to do but I think it would be better to
land this and then lift all of the mailboxes to PCI in a follow on series.
>
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +#define CDAT_DOE_REQ(entry_handle) \
> > + (FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE, \
> > + CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) | \
> > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE, \
> > + CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \
> > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle)))
> > +
> > +static void cxl_doe_task_complete(struct pci_doe_task *task)
> > +{
> > + complete(task->private);
> > +}
> > +
> > +static int cxl_cdat_get_length(struct device *dev,
> > + struct pci_doe_mb *cdat_mb,
> > + size_t *length)
> > +{
> > + u32 cdat_request_pl = CDAT_DOE_REQ(0);
> > + u32 cdat_response_pl[32];
> > + DECLARE_COMPLETION_ONSTACK(c);
> > + struct pci_doe_task task = {
> > + .prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
> > + .prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
> > + .request_pl = &cdat_request_pl,
> > + .request_pl_sz = sizeof(cdat_request_pl),
> > + .response_pl = cdat_response_pl,
> > + .response_pl_sz = sizeof(cdat_response_pl),
> > + .complete = cxl_doe_task_complete,
> > + .private = &c,
> > + };
>
> I think this wants to be macro'ized to something like:
>
> #define DECLARE_PCI_DOE_TASK(vid, type, req, rsp, priv)
I was thinking the same thing after you wanted the task init here too.
Can I follow on with that work? I think the following members need to be
wrapped and hidden from the caller API.
struct pci_doe_task {
...
/* No need for the user to initialize these fields */
struct work_struct work;
struct pci_doe_mb *doe_mb;
};
Reworking that structure goes hand in hand with said macro.
[snip]
> >
> > /**
> > * DOC: cxl objects
> > @@ -267,6 +268,8 @@ struct cxl_nvdimm {
> > * @component_reg_phys: component register capability base address (optional)
> > * @dead: last ep has been removed, force port re-creation
> > * @depth: How deep this port is relative to the root. depth 0 is the root.
> > + * @cdat: Cached CDAT data
> > + * @cdat_sup: Is CDAT supported
>
> That does not provide more information than the member name, how about?
>
> "should a CDAT attribute be published in sysfs"
Sounds good.
But re-reading this 'cdat_available' seems like a better name all around.
>
> > */
> > struct cxl_port {
> > struct device dev;
> > @@ -278,6 +281,8 @@ struct cxl_port {
> > resource_size_t component_reg_phys;
> > bool dead;
> > unsigned int depth;
> > + struct cxl_cdat cdat;
> > + bool cdat_sup;
>
> Just spell out "supported", no need to save characters on this used once
> attribute.
ok.
cdat_available?
>
> > };
> >
> > /**
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index fce1c11729c2..eec597dbe763 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -74,4 +74,5 @@ static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> > int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > struct cxl_dev_state;
> > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
> > +void read_cdat_data(struct cxl_port *port);
> > #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index 3cf308f114c4..1ec34a159657 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -49,6 +49,9 @@ static int cxl_port_probe(struct device *dev)
> > if (IS_ERR(cxlhdm))
> > return PTR_ERR(cxlhdm);
> >
> > + /* Cache the data early to ensure is_visible() works */
> > + read_cdat_data(port);
> > +
> > if (is_cxl_endpoint(port)) {
> > struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > @@ -78,10 +81,58 @@ static int cxl_port_probe(struct device *dev)
> > return 0;
> > }
> >
> > +static ssize_t cdat_read(struct file *filp, struct kobject *kobj,
> > + struct bin_attribute *bin_attr, char *buf,
> > + loff_t offset, size_t count)
> > +{
> > + struct device *dev = kobj_to_dev(kobj);
> > + struct cxl_port *port = to_cxl_port(dev);
>
> Just for completeness you can have an:
>
> if (!port->cdat_supported)
> return -ENXIO;
>
> ...to document expectations / backstop the is_bin_visible().
Sure.
Ira
Powered by blists - more mailing lists