linux-kernel - Re: [PATCH 4/6] x86/microcode/intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1aee0888-b87b-443c-84fa-3bc000cbebcf@intel.com>
Date: Tue, 18 Feb 2025 12:16:44 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: "Chang S. Bae" <chang.seok.bae@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com
Subject: Re: [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic

Remember that the subject prefixes are logical areas of the code, not
filenames.

Bad:

	x86/microcode/intel_staging

Good:

	x86/microcode/intel


On 12/10/24 17:42, Chang S. Bae wrote:
> The staging firmware operates through a protocol via the MMIO interface.
> The protocol defines a serialized sequence that begins by clearing the
> hardware with an abort request. It then proceeds through iterative
> process of sending data, initiating transactions, waiting for processing,
> and reading responses.

I'm not sure how much value this adds. It's rehashing a few things that
are or should be blatantly obvious from the code.

For instance, mentioning that it's an "MMIO interface" is kinda obvious
from the ioremap() and variable named "mmio_base".

> To facilitate this interaction, follow the outlined protocol. Refactor
> the waiting code to manage loop breaks more effectively. Data transfer
> involves a next level of detail to handle the mailbox format. While
> defining helpers, leave them empty for now.

I'm not sure what's being refactored here.


> --- /dev/null
> +++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#define pr_fmt(fmt) "microcode: " fmt
> +#include <linux/delay.h>
> +#include <linux/io.h>
> +
> +#include "internal.h"
> +
> +#define MBOX_REG_NUM		4
> +#define MBOX_REG_SIZE		sizeof(u32)
> +
> +#define MBOX_CONTROL_OFFSET	0x0
> +#define MBOX_STATUS_OFFSET	0x4
> +
> +#define MASK_MBOX_CTRL_ABORT	BIT(0)
> +
> +#define MASK_MBOX_STATUS_ERROR	BIT(2)
> +#define MASK_MBOX_STATUS_READY	BIT(31)
> +
> +#define MBOX_XACTION_LEN	PAGE_SIZE
> +#define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)

The MBOX_XACTION_MAX() concept needs to be explained _somewhere_. The
concept as I remember it is:

	Each image is broken up into pieces which are at most
	MBOX_XACTION_LEN in length. So a 10*MBOX_XACTION_LEN
	will need at least 10 actions. The hardware on the other side of
	the mbox has very few resources. It may not be able to cache the
	entire image and may ask for the same part of the image more
	than once. This means that it may take more than 10 actions to
	send a 10-piece image. Allow a 10-piece image to try 20 times to
	send the whole thing.

Can we comment that here or in the loop?

That's a rather verbose comment, but it is a kinda complicated thing. I
remember trying to talk the hardware guys out of this, but they were
rather insistent that it's required. But in the end it does make sense
to me that the (relatively) svelte device on the other end of this
mailbox might not have megabytes of spare storage and it's a relatively
simple thing to have the CPU send some data more than once.

> +#define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)

I find these a lot more self-explanatory when they gave units on them.

#define MBOX_XACTION_TIMEOUT_MS	(10 * MSEC_PER_SEC)

plus:

	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++)
		msleep(1);

makes it a bit more obvious why there's an msleep(1) if the timeout is
obviously in milliseconds in the first place.

> +#define STAGING_OFFSET_END	0xffffffff

Should this get an explicit type?

> +static inline void abort_xaction(void __iomem *base)
> +{
> +	writel(MASK_MBOX_CTRL_ABORT, base + MBOX_CONTROL_OFFSET);
> +}
> +
> +static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
> +{
> +	pr_debug_once("Need to implement staging mailbox loading code.\n");
> +}

Can we comment this a little more please?

/*
 * Callers should use this after a request_xaction() to handle
 * explicit errors or timeouts when the hardware does not respond.
 *
 * Returns UCODE_OK on success.
 */
> +static enum ucode_state wait_for_xaction(void __iomem *base)
> +{
> +	u32 timeout, status;
> +

	/* Give the hardware time to perform the action: */

> +	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT; timeout++) {
> +		msleep(1);
> +		status = readl(base + MBOX_STATUS_OFFSET);

		/* Break out early if the hardware is ready: */
> +		if (status & MASK_MBOX_STATUS_READY)
> +			break;
> +	}
> +
> +	status = readl(base + MBOX_STATUS_OFFSET);

	/* The hardware signaled an explicit error: */
> +	if (status & MASK_MBOX_STATUS_ERROR)
> +		return UCODE_ERROR;

	/*
	 * Hardware neither responded to the action nor
	 * signaled an error. It must be out to lunch.
	 */
> +	if (!(status & MASK_MBOX_STATUS_READY))
> +		return UCODE_TIMEOUT;
> +
> +	return UCODE_OK;
> +}
> +
> +static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
> +{
> +	pr_debug_once("Need to implement staging response handler.\n");
> +	return UCODE_ERROR;
> +}
> +
> +static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
> +{
> +	WARN_ON_ONCE(totalsize < offset);
> +	return min(MBOX_XACTION_LEN, totalsize - offset);
> +}

I honestly forgot what this is trying to do. Is it trying to break up a
"totalsize" action into pieces that are at most MBOX_XACTION_LEN bytes
in size? But it is also trying to ensure that if it has 10 bytes left
that it doesn't try to request MBOX_XACTION_LEN?

Also, "chunk_size", please. "chunksize" is a bit less readable.

> +enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)

"totalsize" is the length of the data at 'ucode_ptr', right? Would the
connection be clearer if we had:

	void *ucode_ptr,
	int ucode_len_bytes);

? Or even "ucode_len"?

> +{

Could we rename "pa" to "mmio_pa", please? It makes it much more clear
that it's not the "pa" of ucode_ptr or something.

> +	unsigned int xaction_bytes = 0, offset = 0, chunksize;
> +	void __iomem *mmio_base;
> +	enum ucode_state state;
> +
> +	mmio_base = ioremap(pa, MBOX_REG_NUM * MBOX_REG_SIZE);
> +	if (WARN_ON_ONCE(!mmio_base))
> +		return UCODE_ERROR;
> +
> +	abort_xaction(mmio_base);

Why is this aborting first? Why doesn't it have to wait for the abort to
complete?

> +	while (offset != STAGING_OFFSET_END) {
> +		chunksize = get_chunksize(totalsize, offset);
> +		if (xaction_bytes + chunksize > MBOX_XACTION_MAX(totalsize)) {
> +			state = UCODE_TIMEOUT;
> +			break;
> +		}

So, "xaction_bytes" is the number of bytes that we tried to send. If it
exceeds MBOX_XACTION_MAX(), then too many retries occurred. That's a
timeout.

Right?

I don't think that's obvious from the code.

> +
> +		request_xaction(mmio_base, ucode_ptr + offset, chunksize);
> +		state = wait_for_xaction(mmio_base);
> +		if (state != UCODE_OK)
> +			break;
> +
> +		xaction_bytes += chunksize;
> +		state = read_xaction_response(mmio_base, &offset);
> +		if (state != UCODE_OK)
> +			break;
> +	}

So, a dumb me would look at this loop and expect it to look like this:

	while (need_moar_data) {
		if (too_many_retries())
			break;

		send_data();
	}

But it doesn't. There's a two-step process to send data, make sure the
device got the data, and then read a separate response.  It _seems_ like
it's double-checking the response.

Could we comment it something more like this to make that more clear?

	while (need_moar_data) {
		if (too_many_retries())
			break;


		/* Send the data: */
		ret = hw_send_data(offset);
		if (ret)
			break;

		/*
		 * Ask the hardware which piece of the image it needs 	
		 * next. The same piece may be sent more than once.
		 */
		ret = hw_get_next_offset(&offset);
		if (ret)
			break;
	}