[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4pyYvbaSF5DpFGG@iweiny-desk3>
Date: Fri, 2 Dec 2022 13:47:14 -0800
From: Ira Weiny <ira.weiny@...el.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: Steven Rostedt <rostedt@...dmis.org>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
"Ben Widawsky" <bwidawsk@...nel.org>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Davidlohr Bueso <dave@...olabs.net>,
"Dave Jiang" <dave.jiang@...el.com>,
<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>
Subject: Re: [PATCH V2 02/11] cxl/mem: Implement Get Event Records command
On Thu, Dec 01, 2022 at 05:39:12PM -0800, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@...el.com>
> >
[snip]
> >
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/cxl.h>
> > +
> > #include "core.h"
> >
> > static bool cxl_raw_allow_all;
> > @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> > CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> > #endif
> > CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > + CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> > CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> > CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
>
> Similar to this patch:
>
> https://lore.kernel.org/linux-cxl/166993221008.1995348.11651567302609703175.stgit@dwillia2-xfh.jf.intel.com/
>
> CXL_MEM_COMMAND_ID_GET_EVENT_RECORD, should be added to the "always
> kernel" / cxlds->exclusive_cmds mask.
Done for all the commands. I'll rebase as well before sending this out.
>
> > @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >
> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type type)
> > +{
> > + struct cxl_get_event_payload *payload;
> > + u16 nr_rec;
> > +
> > + mutex_lock(&cxlds->event_buf_lock);
> > +
> > + payload = cxlds->event_buf;
> > +
> > + do {
> > + u8 log_type = type;
> > + int rc;
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > + &log_type, sizeof(log_type),
> > + payload, cxlds->payload_size);
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > + cxl_event_log_type_str(type), rc);
> > + goto unlock_buffer;
> > + }
> > +
> > + nr_rec = le16_to_cpu(payload->record_count);
> > + if (trace_cxl_generic_event_enabled()) {
>
> This feels like a premature micro-optimization as none of this code is
> fast path and it is dwarfed by the cost of executing the mailbox
> command. I started with trying to reduce the 80 column collision
> pressure, but then stepped back even further and asked, why?
Because Steven told me to. :-( I should have been smarter than that.
>
> > + int i;
> > +
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload->records[i]);
>
> As far as I can tell trace_cxl_generic_event() always expects a
> device-name as its first argument. So why not enforce that with
> type-safety? I.e. I think trace_cxl_generic_event() should take a
> "struct device *", not a string unless it is really the case that any
> old string will do as the first argument to the trace event. Otherwise
> the trace point can do "__string(dev_name, dev_name(dev))", and mandate
> that callers pass devices.
>From a trace point view 'any old string' will do. There was nothing else the
trace needed from struct device so I skipped it.
[snip]
> > +
> > +/**
> > + * cxl_mem_get_event_records - Get Event Records from the device
> > + * @cxlds: The device data for the operation
> > + *
> > + * Retrieve all event records available on the device and report them as trace
> > + * events.
> > + *
> > + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> > + */
> > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > +{
> > + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > + dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> > +
> > + if (!cxlds->event_buf) {
> > + cxlds->event_buf = alloc_event_buf(cxlds);
> > + if (WARN_ON_ONCE(!cxlds->event_buf))
> > + return;
> > + }
>
> What's the point of having an event_buf_lock if event_buf is reallocated
> every call?
This is only called on start up.
>
> Just allocate event_buf once at the beginning of time during init if the
> device supports event log retrieval, and fail the driver load if that
> allocation fails. No runtime WARN() for memory allocation.
It was. I'll make that more clear in the next series.
>
> I notice this patch does not clear events, I trust that comes later in
> the series, but I think it belongs here to make this patch a complete
> standalone thought.
Squashed. But it does make for a large patch. Which I'm not a fan of for
review. Lucky that now we have a lot of review on the parts.
>
> > + if (status & CXLDEV_EVENT_STATUS_INFO)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > + if (status & CXLDEV_EVENT_STATUS_WARN)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > + if (status & CXLDEV_EVENT_STATUS_FAIL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > + if (status & CXLDEV_EVENT_STATUS_FATAL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
>
> This retrieval order should be flipped. If there is a FATAL pending I
> expect a monitor wants that first and would be happy to parse the INFO
> later. I would go so far as to say that if the INFO logger is looping
> and a FATAL comes in the driver should get that out first before going
> back for more INFO logs. That would mean executing Clear Events and
> looping through the logs by priority until all the status bits fall
> silent inside cxl_mem_get_records_log().
I'll flip them. And determine if this is really what we want to do for the
irq.
The issue with the irq handling calling a single function which checks all
status is that we may end up with some odd interrupts doing nothing depending
on racing etc.
A buffer per log would eliminate this to some extent if the message numbers are
not shared. I don't doubt that vendors are unlikely to burn more than 1
message number so the irq may indeed be shared and serialized anyway.
For simplicity I'll throw them all together.
>
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> > +
> > /**
> > * cxl_mem_get_partition_info - Get partition info
> > * @cxlds: The device data for the operation
> > @@ -846,6 +950,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > }
> >
> > mutex_init(&cxlds->mbox_mutex);
> > + mutex_init(&cxlds->event_buf_lock);
> > cxlds->dev = dev;
> >
> > return cxlds;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index f680450f0b16..d4baae74cd97 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -132,6 +132,13 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
> > #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> > #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
> >
> > +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> > +#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
> > +#define CXLDEV_EVENT_STATUS_INFO BIT(0)
> > +#define CXLDEV_EVENT_STATUS_WARN BIT(1)
> > +#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
> > +#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
> > +
> > /* CXL 2.0 8.2.8.4 Mailbox Registers */
> > #define CXLDEV_MBOX_CAPS_OFFSET 0x00
> > #define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index cd35f43fedd4..55d57f5a64bc 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -4,6 +4,7 @@
> > #define __CXL_MEM_H__
> > #include <uapi/linux/cxl_mem.h>
> > #include <linux/cdev.h>
> > +#include <linux/uuid.h>
> > #include "cxl.h"
> >
> > /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > @@ -250,12 +251,16 @@ struct cxl_dev_state {
> >
> > bool msi_enabled;
> >
> > + struct cxl_get_event_payload *event_buf;
> > + struct mutex event_buf_lock;
> > +
>
> Missing kdoc.
Got it from Jonathan.
>
> > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > };
> >
> > enum cxl_opcode {
> > CXL_MBOX_OP_INVALID = 0x0000,
> > CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> > + CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> > CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> > CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> > CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> > @@ -325,6 +330,72 @@ struct cxl_mbox_identify {
> > u8 qos_telemetry_caps;
> > } __packed;
> >
> > +/*
> > + * Common Event Record Format
> > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > + */
> > +struct cxl_event_record_hdr {
> > + uuid_t id;
> > + u8 length;
> > + u8 flags[3];
> > + __le16 handle;
> > + __le16 related_handle;
> > + __le64 timestamp;
> > + u8 maint_op_class;
> > + u8 reserved[0xf];
>
> Nit, but lets not copy the spec here and just make all the field sizes
> decimal. It even saves a character to write 15 instead of 0xf, and @flags
> is also decimal.
Ok.
>
> > +} __packed;
> > +
> > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > +struct cxl_event_record_raw {
> > + struct cxl_event_record_hdr hdr;
> > + u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> > +} __packed;
> > +
> > +/*
> > + * Get Event Records output payload
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > + */
> > +#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
> > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > +struct cxl_get_event_payload {
> > + u8 flags;
> > + u8 reserved1;
> > + __le16 overflow_err_count;
> > + __le64 first_overflow_timestamp;
> > + __le64 last_overflow_timestamp;
> > + __le16 record_count;
> > + u8 reserved2[0xa];
>
> Same nit.
Done.
[snip]
> > +
> > +/* This part must be outside protection */
> > +#undef TRACE_INCLUDE_FILE
> > +#define TRACE_INCLUDE_FILE cxl
> > +#include <trace/define_trace.h>
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index c71021a2a9ed..70459be5bdd4 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -24,6 +24,7 @@
> > ___C(IDENTIFY, "Identify Command"), \
> > ___C(RAW, "Raw device command"), \
> > ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> > + ___C(GET_EVENT_RECORD, "Get Event Record"), \
> > ___C(GET_FW_INFO, "Get FW Info"), \
> > ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> > ___C(GET_LSA, "Get Label Storage Area"), \
>
> Yikes, no, this is an enum. New commands need to come at the end
> otherwise different kernels will disagree about the command numbering.
Ooops sorry about that. I somehow thought these were tied to the command
values.
Thanks, Changed for all the commands,
> Likely needs a comment here alerting future devs about the ABI breakage
> danger here.
Additional patch added.
Ira
Powered by blists - more mailing lists