[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62d07a37d25ed_16fb9729491@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 14 Jul 2022 13:19:03 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Ira Weiny <ira.weiny@...el.com>,
Dan Williams <dan.j.williams@...el.com>
CC: Bjorn Helgaas <bhelgaas@...gle.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Lukas Wunner <lukas@...ner.de>,
"Alison Schofield" <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Ben Widawsky <bwidawsk@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<linux-pci@...r.kernel.org>
Subject: Re: [PATCH V13 6/9] cxl/port: Read CDAT table
Ira Weiny wrote:
> On Thu, Jul 14, 2022 at 08:38:29AM -0700, Dan Williams wrote:
> > ira.weiny@ wrote:
> > > From: Ira Weiny <ira.weiny@...el.com>
>
> [snip]
>
> > > ---
> > > drivers/cxl/cdat.h | 100 ++++++++++++++++++++++++
> > > drivers/cxl/core/pci.c | 170 +++++++++++++++++++++++++++++++++++++++++
> > > drivers/cxl/cxl.h | 5 ++
> > > drivers/cxl/cxlpci.h | 1 +
> > > drivers/cxl/port.c | 51 +++++++++++++
> > > 5 files changed, 327 insertions(+)
> > > create mode 100644 drivers/cxl/cdat.h
> >
> > The bin_attr additions to CXL sysfs need to be added to
> > Documentation/ABI/testing/sysfs-bus-cxl.
>
> Done. But with a very simple text:
>
> What: /sys/bus/cxl/devices/endpointX/CDAT/cdat
> Date: July, 2022
> KernelVersion: v5.19
> Contact: linux-cxl@...r.kernel.org
> Description:
> (RO) Raw binary CDAT data.
>
> Is this enough?
It should talk about what it means when this is absent, present and
0-sized, present and non-zero sized.
>
> [snip]
>
> > > +
> > > +/* Device Scoped Memory Side Cache Information Structure */
> > > +#define CDAT_DSMSCIS_DW1_HANDLE 0x000000ff
> > > +#define CDAT_DSMSCIS_MEMORY_SIDE_CACHE_SIZE(entry) \
> > > + ((u64)((entry)[3]) << 32 | (entry)[2])
> >
> > This looks sketchy for 2 reasons, there is no type safety on @entry, so
> > it could be anything, and it is referenced twice which is a bug waiting
> > to happen if this or any of the other macros that copy this pattern pass
> > a statement argument with side-effects like:
> > CDAT_DSMSCIS_MEMORY_SIDE_CACHE_SIZE(entry++).
> >
> > > +#define CDAT_DSMSCIS_DW4_MEMORY_SIDE_CACHE_ATTRS 0xffffffff
> > > +
> > > +/* Device Scoped Initiator Structure */
> > > +#define CDAT_DSIS_DW1_FLAGS 0x000000ff
> > > +#define CDAT_DSIS_DW1_HANDLE 0x0000ff00
> > > +
> > > +/* Device Scoped EFI Memory Type Structure */
> > > +#define CDAT_DSEMTS_DW1_HANDLE 0x000000ff
> > > +#define CDAT_DSEMTS_DW1_EFI_MEMORY_TYPE_ATTR 0x0000ff00
> > > +#define CDAT_DSEMTS_DPA_OFFSET(entry) ((u64)((entry)[3]) << 32 | (entry)[2])
> > > +#define CDAT_DSEMTS_DPA_LENGTH(entry) ((u64)((entry)[5]) << 32 | (entry)[4])
> >
> > More sketchy double array derefernces against an unspecified type. No
> > need to invent a new way to parse ACPI-like table data. CDAT parsing
> > should end up looking a lot like: drivers/acpi/numa/hmat.c. I.e. I would
> > expect the helpers in drivers/acpi/tables.c are repurposed to parse a
> > passed in CDAT data buffer rather than using acpi_get_table()
> > internally.
> >
> > Lets drop patch 9 for now and all of these definitions. Leave all the
> > parsing to userspace via the bin_attr. Then we can circle back and build
> > up the CDAT parsing code when its ready to consume the data for QTG
> > assignment, and reuse the ACPI parsing code.
> >
> > > +
> > > +/* Switch Scoped Latency and Bandwidth Information Structure */
> > > +#define CDAT_SSLBIS_DW1_DATA_TYPE 0x000000ff
> > > +#define CDAT_SSLBIS_BASE_UNIT(entry) ((u64)((entry)[3]) << 32 | (entry)[2])
> > > +#define CDAT_SSLBIS_ENTRY_PORT_X(entry, i) ((entry)[4 + (i) * 2] & 0x0000ffff)
> > > +#define CDAT_SSLBIS_ENTRY_PORT_Y(entry, i) (((entry)[4 + (i) * 2] & 0xffff0000) >> 16)
> > > +#define CDAT_SSLBIS_ENTRY_LAT_OR_BW(entry, i) ((entry)[4 + (i) * 2 + 1] & 0x0000ffff)
> >
> > Yeah, all of these unused CDAT defintions can be deleted from this
> > patch.
> >
>
> Done and patch 9 dropped.
>
> > > +
> > > +#define CXL_DOE_PROTOCOL_TABLE_ACCESS 2
> > > +
> > > +/**
> > > + * struct cxl_cdat - CXL CDAT data
> > > + *
> > > + * @table: cache of CDAT table
> > > + * @length: length of cached CDAT table
> > > + */
> > > +struct cxl_cdat {
> > > + void *table;
> > > + size_t length;
> > > +};
> > > +
> > > +#endif /* !__CXL_CDAT_H__ */
> > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > > index c4c99ff7b55e..9232b806d051 100644
> > > --- a/drivers/cxl/core/pci.c
> > > +++ b/drivers/cxl/core/pci.c
> > > @@ -4,10 +4,12 @@
> > > #include <linux/device.h>
> > > #include <linux/delay.h>
> > > #include <linux/pci.h>
> > > +#include <linux/pci-doe.h>
> > > #include <cxlpci.h>
> > > #include <cxlmem.h>
> > > #include <cxl.h>
> > > #include "core.h"
> > > +#include "cdat.h"
> > >
> > > /**
> > > * DOC: cxl core pci
> > > @@ -458,3 +460,171 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > > return 0;
> > > }
> > > EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
> > > +
> > > +static struct pci_doe_mb *find_cdat_mb(struct device *uport)
> >
> > small naming nit, probably more places than this, but
> > s/find_cdat_mb/find_cdat_doe/ to make it clearly distinct from the
> > drivers/cxl/core/mbox.c infrastructure. I.e. I think "mailbox" is
> > implied by "doe".
>
> Good idea, mailbox (and mb) is overloaded.
>
> >
> > > +{
> > > + struct cxl_memdev *cxlmd;
> > > + struct cxl_dev_state *cxlds;
> > > + unsigned long index;
> > > + void *entry;
> > > +
> > > + if (!is_cxl_memdev(uport))
> > > + return NULL;
> >
> > If this only supports endpoints just call this from the part of
> > cxl_port_probe() that does:
> >
> > if (is_cxl_endpoint(port)) {
> > struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > ...
> >
> > ...and let the person who adds CDAT support for switches refactor it
> > later. I.e. find_cdat_doe() can just take a @cxlmd arg directly.
> >
>
> I was trying to make this generic for all ports originally. But in the end we
> punted and said end points only since we don't have the plumbing for PCI port
> driver yet. Won't I have to move this back when that comes online?
Yes, but:
if (!is_cxl_memdev(uport))
return NULL;
...in this path has no reason to exist in v5.19. The need to consider
switch-port-CDAT is at least a full kernel dev-cycle away.
> > It's
> > enough that this patch located the cdat cached buffer on 'struct
> > cxl_port' and the 'CDAT' sysfs attribute under
> > /sys/bus/cxl/device/cxl_portX.
>
> I'm not parsing this sentence?
>
> .../cxl_portX/...
>
> does not exist?
No, I'm saying that the sysfs location is fine, and find_cdat_doe() need
not consider switch-port-CDAT yet.
>
> root fedora /sys/bus/cxl/drivers
> 12:46:39 > ll
> total 0
> drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_mem
> drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm
> drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm_bridge
> drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_port
>
> endpointX points back to the same place as /sys/bus/cxl/devices/endpointX
>
> root fedora /sys/bus/cxl/drivers/cxl_port
> 12:47:03 > ll
> total 0
> ...
> lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
> ...
>
> root fedora /sys/bus/cxl/devices
> 12:47:59 > ll
> total 0
> ...
> lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
>
> So what is different from what I'm doing?
Nothing wrong here, sysfs CDAT location is fine.
>
> >
> > > +
> > > + cxlmd = to_cxl_memdev(uport);
> > > + cxlds = cxlmd->cxlds;
> > > +
> > > + xa_for_each(&cxlds->doe_mbs, index, entry) {
> > > + struct pci_doe_mb *cur = entry;
> > > +
> > > + if (pci_doe_supports_prot(cur, PCI_DVSEC_VENDOR_ID_CXL,
> > > + CXL_DOE_PROTOCOL_TABLE_ACCESS))
> > > + return cur;
> >
> > This has me thinking the doe_mbs xarray is overkill if all the other
> > DOEs are just enumerated and then ignored. When / if more DOEs need to
> > be handled then we can think about instantiating all of them and keeping
> > them aruond. Otherwise, just do this pci_doe_supports_prot() check in
> > cxl_pci and throw away the rest. Then cxlds would just carry a single
> > @cdat_doe attribute for this first use of DOE capabilities in the
> > kernel.
>
> I was hoping that code could be lifted to the PCI side later. For now CXL
> could handle all DOE's.
It's only a couple of code to fetch the CDAT DOE out of all the DOEs.
> At this point CXL devices are the only one using DOE in the kernel. I know
> there are other PCI uses coming but for any 1 device only one mailbox object
> should be created or we are going to have the kernel drivers conflicting with
> each other.
In what scenario would a PCI driver conflict with another PCI driver for
the same device?
In the CXL case cxl_pci finds the one DOE it cares about and throws away
the rest and nothing else in the kernel will ever conflict with that.
Don't try to solve future problems until they become present problems.
> So I'd rather leave this series with CXL controlling the DOE mailboxes.
Sounds good, I'm not asking you to change that, just asking for this few
lines of filtering to move to cxl_pci.
> I think later we will need to lift the mailbox objects to PCI and CXL
> drivers will need to query into that side as needed. There was a
> version like this before (Probably Jonathan's) and I think the aux bus
> moved it all over here. <sigh> Sorry getting pretty confused myself
> at this point.
Sure cross that bridge when we get to it.
> The code change you suggest is easy to do but I think it would be better to
> land this and then lift all of the mailboxes to PCI in a follow on series.
That follow on move is not prevented by just the small move of this
filter to cxl_pci, right?
> > > +#define CDAT_DOE_REQ(entry_handle) \
> > > + (FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE, \
> > > + CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) | \
> > > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE, \
> > > + CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \
> > > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle)))
> > > +
> > > +static void cxl_doe_task_complete(struct pci_doe_task *task)
> > > +{
> > > + complete(task->private);
> > > +}
> > > +
> > > +static int cxl_cdat_get_length(struct device *dev,
> > > + struct pci_doe_mb *cdat_mb,
> > > + size_t *length)
> > > +{
> > > + u32 cdat_request_pl = CDAT_DOE_REQ(0);
> > > + u32 cdat_response_pl[32];
> > > + DECLARE_COMPLETION_ONSTACK(c);
> > > + struct pci_doe_task task = {
> > > + .prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
> > > + .prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
> > > + .request_pl = &cdat_request_pl,
> > > + .request_pl_sz = sizeof(cdat_request_pl),
> > > + .response_pl = cdat_response_pl,
> > > + .response_pl_sz = sizeof(cdat_response_pl),
> > > + .complete = cxl_doe_task_complete,
> > > + .private = &c,
> > > + };
> >
> > I think this wants to be macro'ized to something like:
> >
> > #define DECLARE_PCI_DOE_TASK(vid, type, req, rsp, priv)
>
> I was thinking the same thing after you wanted the task init here too.
>
> Can I follow on with that work? I think the following members need to be
> wrapped and hidden from the caller API.
Since this patch already needs a respin you could just do the simple
DECLARE_PCI_DOE_TASK() now and figure out the hiding of @work later.
I.e. keep the INIT_WORK() in submit for now.
>
> struct pci_doe_task {
> ...
> /* No need for the user to initialize these fields */
> struct work_struct work;
> struct pci_doe_mb *doe_mb;
> };
>
> Reworking that structure goes hand in hand with said macro.
>
> [snip]
>
> > >
> > > /**
> > > * DOC: cxl objects
> > > @@ -267,6 +268,8 @@ struct cxl_nvdimm {
> > > * @component_reg_phys: component register capability base address (optional)
> > > * @dead: last ep has been removed, force port re-creation
> > > * @depth: How deep this port is relative to the root. depth 0 is the root.
> > > + * @cdat: Cached CDAT data
> > > + * @cdat_sup: Is CDAT supported
> >
> > That does not provide more information than the member name, how about?
> >
> > "should a CDAT attribute be published in sysfs"
>
> Sounds good.
>
> But re-reading this 'cdat_available' seems like a better name all around.
Ok.
>
> >
> > > */
> > > struct cxl_port {
> > > struct device dev;
> > > @@ -278,6 +281,8 @@ struct cxl_port {
> > > resource_size_t component_reg_phys;
> > > bool dead;
> > > unsigned int depth;
> > > + struct cxl_cdat cdat;
> > > + bool cdat_sup;
> >
> > Just spell out "supported", no need to save characters on this used once
> > attribute.
>
> ok.
>
> cdat_available?
Sure.
Powered by blists - more mailing lists