lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87frcna1mq.fsf@AUSNATLYNCH.amd.com>
Date: Mon, 15 Sep 2025 15:42:37 -0500
From: Nathan Lynch <nathan.lynch@....com>
To: Jonathan Cameron <jonathan.cameron@...wei.com>, "Nathan Lynch via B4
 Relay" <devnull+nathan.lynch.amd.com@...nel.org>
CC: Vinod Koul <vkoul@...nel.org>, Wei Huang <wei.huang2@....com>, "Mario
 Limonciello" <mario.limonciello@....com>, Bjorn Helgaas
	<bhelgaas@...gle.com>, <linux-pci@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <dmaengine@...r.kernel.org>
Subject: Re: [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support

Jonathan Cameron <jonathan.cameron@...wei.com> writes:

> On Fri, 05 Sep 2025 13:48:29 -0500
> Nathan Lynch via B4 Relay <devnull+nathan.lynch.amd.com@...nel.org> wrote:
>
>> From: Nathan Lynch <nathan.lynch@....com>
>> 
>> SDXI implementations provide software with detailed information about
>> error conditions using a per-device ring buffer in system memory. When
>> an error condition is signaled via interrupt, the driver retrieves any
>> pending error log entries and reports them to the kernel log.
>> 
>> Co-developed-by: Wei Huang <wei.huang2@....com>
>> Signed-off-by: Wei Huang <wei.huang2@....com>
>> Signed-off-by: Nathan Lynch <nathan.lynch@....com>
> Hi,
> A few more comments inline. Kind of similar stuff around
> having both register definitions for unpacking and the structure
> definitions in patch 2.
>
> Thanks,
>
> Jonathan
>> ---
>>  drivers/dma/sdxi/error.c | 340 +++++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/dma/sdxi/error.h |  16 +++
>>  2 files changed, 356 insertions(+)
>> 
>> diff --git a/drivers/dma/sdxi/error.c b/drivers/dma/sdxi/error.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..c5e33f5989250352f6b081a3049b3b1f972c85a6
>> --- /dev/null
>> +++ b/drivers/dma/sdxi/error.c
>
>> +/* The "unpacked" counterpart to ERRLOG_HD_ENT. */
>> +struct errlog_entry {
>> +	u64 dsc_index;
>> +	u16 cxt_num;
>> +	u16 err_class;
>> +	u16 type;
>> +	u8 step;
>> +	u8 buf;
>> +	u8 sub_step;
>> +	u8 re;
>> +	bool vl;
>> +	bool cv;
>> +	bool div;
>> +	bool bv;
>> +};
>> +
>> +#define ERRLOG_ENTRY_FIELD(hi_, lo_, name_)				\
>> +	PACKED_FIELD(hi_, lo_, struct errlog_entry, name_)
>> +#define ERRLOG_ENTRY_FLAG(nr_, name_) \
>> +	ERRLOG_ENTRY_FIELD(nr_, nr_, name_)
>> +
>> +/* Refer to "Error Log Header Entry (ERRLOG_HD_ENT)" */
>> +static const struct packed_field_u16 errlog_hd_ent_fields[] = {
>> +	ERRLOG_ENTRY_FLAG(0, vl),
>> +	ERRLOG_ENTRY_FIELD(13, 8, step),
>> +	ERRLOG_ENTRY_FIELD(26, 16, type),
>> +	ERRLOG_ENTRY_FLAG(32, cv),
>> +	ERRLOG_ENTRY_FLAG(33, div),
>> +	ERRLOG_ENTRY_FLAG(34, bv),
>> +	ERRLOG_ENTRY_FIELD(38, 36, buf),
>> +	ERRLOG_ENTRY_FIELD(43, 40, sub_step),
>> +	ERRLOG_ENTRY_FIELD(46, 44, re),
>> +	ERRLOG_ENTRY_FIELD(63, 48, cxt_num),
>> +	ERRLOG_ENTRY_FIELD(127, 64, dsc_index),
>> +	ERRLOG_ENTRY_FIELD(367, 352, err_class),
>
> The association between the fields here and struct sdxi_err_log_hd_ent
> to me should be via some defines in patch 2 for the various fields
> embedded in misc0 etc.
>
>> +};
>
>> +static void sdxi_print_err(struct sdxi_dev *sdxi, u64 err_rd)
>> +{
>> +	struct errlog_entry ent;
>> +	size_t index;
>> +
>> +	index = err_rd % ERROR_LOG_ENTRIES;
>> +
>> +	unpack_fields(&sdxi->err_log[index], sizeof(sdxi->err_log[0]),
>> +		      &ent, errlog_hd_ent_fields, SDXI_PACKING_QUIRKS);
>> +
>> +	if (!ent.vl) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with vl=0\n");
>> +		return;
>> +	}
>> +
>> +	if (ent.type != OP_TYPE_ERRLOG) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with type=%#x\n",
>> +				    ent.type);
>> +		return;
>> +	}
>> +
>> +	sdxi_err(sdxi, "error log entry[%zu], MMIO_ERR_RD=%#llx:\n",
>> +		 index, err_rd);
>> +	sdxi_err(sdxi, "  re: %#x (%s)\n", ent.re, reaction_str(ent.re));
>> +	sdxi_err(sdxi, "  step: %#x (%s)\n", ent.step, step_str(ent.step));
>> +	sdxi_err(sdxi, "  sub_step: %#x (%s)\n",
>> +		 ent.sub_step, sub_step_str(ent.sub_step));
>> +	sdxi_err(sdxi, "  cv: %u div: %u bv: %u\n", ent.cv, ent.div, ent.bv);
>> +	if (ent.bv)
>> +		sdxi_err(sdxi, "  buf: %u\n", ent.buf);
>> +	if (ent.cv)
>> +		sdxi_err(sdxi, "  cxt_num: %#x\n", ent.cxt_num);
>> +	if (ent.div)
>> +		sdxi_err(sdxi, "  dsc_index: %#llx\n", ent.dsc_index);
>> +	sdxi_err(sdxi, "  err_class: %#x\n", ent.err_class);
> Consider using tracepoints for error logging rather than large splats
> in the log.

Agreed, context-level errors (which will be user-triggerable once there
is an ABI exposed to user space) should not be dumped to the kernel log
by default.

Some function-level errors may be appropriate to print.

> I'd then just fill the tracepoint in directly rather than have an
> unpacking step.

Yes, I can do that.


>
>> +}
>
>> +/* Refer to "Error Log Initialization" */
>> +int sdxi_error_init(struct sdxi_dev *sdxi)
>> +{
>> +	u64 reg;
>> +	int err;
>> +
>> +	/* 1. Clear MMIO_ERR_CFG. Error interrupts are inhibited until step 6. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CFG, 0);
>> +
>> +	/* 2. Clear MMIO_ERR_STS. The flags in this register are RW1C. */
>> +	reg = FIELD_PREP(SDXI_MMIO_ERR_STS_STS_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_OVF_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_ERR_BIT, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_STS, reg);
>> +
>> +	/* 3. Allocate memory for the error log ring buffer, initialize to zero. */
>> +	sdxi->err_log = dma_alloc_coherent(sdxi_to_dev(sdxi), ERROR_LOG_SZ,
>> +					   &sdxi->err_log_dma, GFP_KERNEL);
>> +	if (!sdxi->err_log)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * 4. Set MMIO_ERR_CTL.intr_en to 1 if interrupts on
>> +	 * context-level errors are desired.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_ERR_CTL);
>> +	FIELD_MODIFY(SDXI_MMIO_ERR_CTL_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CTL, reg);
>> +
>> +	/*
>> +	 * The spec is not explicit about when to do this, but this
>> +	 * seems like the right time: enable interrupt on
>> +	 * function-level transition to error state.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
>> +	FIELD_MODIFY(SDXI_MMIO_CTL0_FN_ERR_INTR_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_CTL0, reg);
>> +
>> +	/* 5. Clear MMIO_ERR_WRT and MMIO_ERR_RD. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_WRT, 0);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_RD, 0);
>> +
>> +	/*
>> +	 * Error interrupts can be generated once MMIO_ERR_CFG.en is
>> +	 * set in step 6, so set up the handler now.
>> +	 */
>> +	err = request_threaded_irq(sdxi->error_irq, NULL, sdxi_irq_thread,
>> +				   IRQF_TRIGGER_NONE, "SDXI error", sdxi);
>> +	if (err)
>> +		goto free_errlog;
>> +
>> +	/* 6. Program MMIO_ERR_CFG. */
>
> I'm guessing these are numbers steps in some bit of the spec?
> If not some of these comments like this one provide no value.  We can
> see what is being written from the code!  Perhaps add a very specific
> spec reference if you want to show why the numbering is here.

Perhaps it's understated, but at the beginning of this function:

  /* Refer to "Error Log Initialization" */
  int sdxi_error_init(struct sdxi_dev *sdxi)

The numbered steps in the function correspond to the numbered steps in
that part of the spec.

I could make the comment something like:

/*
 * The numbered steps below correspond to the sequence outlined in 3.4.2
 * "Error Log Initialization".
 */

though I'm unsure how stable the section numbering in the SDXI spec will
be over time.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ