linux-kernel - RE: [PATCH V3 2/8] cxl/mem: Wire up event interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6393ad74877d9_579c129448@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Fri, 9 Dec 2022 13:49:40 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     <ira.weiny@...el.com>, Dan Williams <dan.j.williams@...el.com>
CC:     Davidlohr Bueso <dave@...olabs.net>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Alison Schofield <alison.schofield@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        "Dave Jiang" <dave.jiang@...el.com>,
        <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
        <linux-acpi@...r.kernel.org>, <linux-cxl@...r.kernel.org>
Subject: RE: [PATCH V3 2/8] cxl/mem: Wire up event interrupts

ira.weiny@ wrote:
> From: Davidlohr Bueso <dave@...olabs.net>
> 
> Currently the only CXL features targeted for irq support require their
> message numbers to be within the first 16 entries.  The device may
> however support less than 16 entries depending on the support it
> provides.
> 
> Attempt to allocate these 16 irq vectors.  If the device supports less
> then the PCI infrastructure will allocate that number.  Upon successful
> allocation, users can plug in their respective isr at any point
> thereafter.
> 
> CXL device events are signaled via interrupts.  Each event log may have
> a different interrupt message number.  These message numbers are
> reported in the Get Event Interrupt Policy mailbox command.
> 
> Add interrupt support for event logs.  Interrupts are allocated as
> shared interrupts.  Therefore, all or some event logs can share the same
> message number.
> 
> In addition all logs are queried on any interrupt in order of the most
> to least severe based on the status register.
> 
> Cc: Bjorn Helgaas <helgaas@...nel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Co-developed-by: Ira Weiny <ira.weiny@...el.com>
> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
> Signed-off-by: Davidlohr Bueso <dave@...olabs.net>
> 
> ---
> Changes from V2:
> 	General clean up
> 	Use cxl_log_id to ensure each irq is unique even if the message numbers are not
> 	Jonathan/Dan
> 		Only set up irq vector when OSC indicates OS control
> 	Dan
> 		Loop reading while status indicates there are more
> 			events.
> 		Use new cxl_internal_send_cmd()
> 		Squash MSI/MSIx base patch from Davidlohr
> 		Remove uapi defines altogether
> 		Remove use of msi_enabled
> 	Remove the use of cxl_event_log_type_str()
> 	Pick up tag
> 
> Changes from V1:
> 	Remove unneeded evt_int_policy from struct cxl_dev_state
> 	defer Dynamic Capacity support
> 	Dave Jiang
> 		s/irq/rc
> 		use IRQ_NONE to signal the irq was not for us.
> 	Jonathan
> 		use msi_enabled rather than nr_irq_vec
> 		On failure explicitly set CXL_INT_NONE
> 		Add comment for Get Event Interrupt Policy
> 		use devm_request_threaded_irq()
> 		Use individual handler/thread functions for each of the
> 		logs rather than struct cxl_event_irq_id.
> 
> Changes from RFC v2
> 	Adjust to new irq 16 vector allocation
> 	Jonathan
> 		Remove CXL_INT_RES
> 	Use irq threads to ensure mailbox commands are executed outside irq context
> 	Adjust for optional Dynamic Capacity log
> ---
>  drivers/cxl/core/mbox.c | 42 +++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 28 +++++++++++++
>  drivers/cxl/cxlpci.h    |  6 +++
>  drivers/cxl/pci.c       | 90 ++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 165 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 815da3aac081..2b25691a9b09 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -854,6 +854,48 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>  
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds,
> +			     struct cxl_event_interrupt_policy *policy)
> +{
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	policy->info_settings = CXL_INT_MSI_MSIX;
> +	policy->warn_settings = CXL_INT_MSI_MSIX;
> +	policy->failure_settings = CXL_INT_MSI_MSIX;
> +	policy->fatal_settings = CXL_INT_MSI_MSIX;

For Robustness Principle "be conservative in what is sent" purposes I
would do the Get Events first to make sure that nothing is steered to
the Firmware VDM, and warn the user that their BIOS gave the OS CXL
Error Control, but did not shutdown event interrupts.

I.e. if the event interrupts are still steered to BIOS then BIOS may
think it still has control of the event logs and trouble ensues.

> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_SET_EVT_INT_POLICY,
> +		.payload_in = policy,
> +		.size_in = sizeof(*policy),
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> +			rc);
> +		return rc;
> +	}
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_GET_EVT_INT_POLICY,
> +		.payload_out = policy,
> +		.size_out = sizeof(*policy),
> +	};
> +
> +	/* Retrieve interrupt message numbers */
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> +			rc);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);

A question, why is this function in the core and not in cxl_pci? For
cxl_test mocking purposes? Otherwise seems ok to keep this in the same
file as its only caller.

> +
>  /**
>   * cxl_mem_get_partition_info - Get partition info
>   * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index dd9aa3dd738e..350cb460e7fc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -194,6 +194,30 @@ struct cxl_endpoint_dvsec_info {
>  	struct range dvsec_range[2];
>  };
>  
> +/**
> + * Event Interrupt Policy
> + *
> + * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
> + */
> +enum cxl_event_int_mode {
> +	CXL_INT_NONE		= 0x00,
> +	CXL_INT_MSI_MSIX	= 0x01,
> +	CXL_INT_FW		= 0x02
> +};
> +#define CXL_EVENT_INT_MODE_MASK 0x3
> +#define CXL_EVENT_INT_MSGNUM(setting) (((setting) & 0xf0) >> 4)
> +struct cxl_event_interrupt_policy {
> +	u8 info_settings;
> +	u8 warn_settings;
> +	u8 failure_settings;
> +	u8 fatal_settings;
> +} __packed;
> +
> +static inline bool cxl_evt_int_is_msi(u8 setting)
> +{
> +	return CXL_INT_MSI_MSIX == (setting & CXL_EVENT_INT_MODE_MASK);
> +}
> +
>  /**
>   * struct cxl_event_state - Event log driver state
>   *
> @@ -288,6 +312,8 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
>  	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
> +	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
> +	CXL_MBOX_OP_SET_EVT_INT_POLICY	= 0x0103,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -525,6 +551,8 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status);
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds,
> +			     struct cxl_event_interrupt_policy *policy);
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 77dbdb980b12..4aaadf17a985 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,6 +53,12 @@
>  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
>  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
>  
> +/*
> + * NOTE: Currently all the functions which are enabled for CXL require their
> + * vectors to be in the first 16.  Use this as the max.
> + */
> +#define CXL_PCI_REQUIRED_VECTORS 16
> +
>  /* Register Block Identifier (RBI) */
>  enum cxl_regloc_type {
>  	CXL_REGLOC_RBI_EMPTY = 0,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 86c84611a168..c84922a287ec 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -452,6 +452,90 @@ static void cxl_clear_event_logs(struct cxl_dev_state *cxlds)
>  	cxl_mem_get_event_records(cxlds, CXLDEV_EVENT_STATUS_ALL);
>  }
>  
> +static void cxl_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int nvecs;
> +
> +	/*
> +	 * pci_alloc_irq_vectors() handles calling pci_free_irq_vectors()
> +	 * automatically despite not being called pcim_*.  See
> +	 * pci_setup_msi_context().
> +	 */

I think a more important comment is why the flags are limited to MSIX
and MSI, that's a non-obvious CXL spec constraint.

> +	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,

Since I have some other fixups below I'll go ahead and quibble with the
name. The 'requirement' is 1 vector, so
s/CXL_PCI_REQUIRED_VECTORS/CXL_PCI_DEFAULT_VECTORS/ or something like
that. As it stands today there are diminishing returns to ask for more
than that amount.

In the future, if the code knows better that a specific device could
benefit from more than the default, then it can arrange to override
this. Absent that, today there is no reason to try to ask for more.

> +				      PCI_IRQ_MSIX | PCI_IRQ_MSI);
> +	if (nvecs < 1)
> +		dev_dbg(dev, "Failed to alloc irq vectors: %d\n", nvecs);

Just fail the driver load if this happens. There is something wrong if a
PCI driver cannot even allocate 1 vector.

> +}
> +
> +struct cxl_dev_id {
> +	struct cxl_dev_state *cxlds;
> +};
> +
> +static irqreturn_t cxl_event_thread(int irq, void *id)
> +{
> +	struct cxl_dev_id *dev_id = id;
> +	struct cxl_dev_state *cxlds = dev_id->cxlds;
> +	u32 status;
> +
> +	/*
> +	 * CXL 3.0 8.2.8.3.1: The lower 32 bits are the status;
> +	 * ignore the reserved upper 32 bits
> +	 */
> +	status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +	while (status) {
> +		cxl_mem_get_event_records(cxlds, status);
> +		cond_resched();
> +		status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +	}
> +	return IRQ_HANDLED;
> +}
> +
> +static int cxl_req_event_irq(struct cxl_dev_state *cxlds, u8 setting)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_dev_id *dev_id;
> +	int irq;
> +
> +	if (!cxl_evt_int_is_msi(setting))
> +		return -ENXIO;
> +
> +	/* dev_id must be globally unique and must contain the cxlds */
> +	dev_id = devm_kmalloc(dev, sizeof(*dev_id), GFP_KERNEL);

Yes, the id is simple and fully initialized below, but this is not a
fast path and the rest of the driver uses devm_kzalloc() even if it
fully inits the result. So its a consistency thing and maybe a "save the
future person who adds another field without initializing it some
hassle" thing.

> +	if (!dev_id)
> +		return -ENOMEM;
> +	dev_id->cxlds = cxlds;
> +
> +	irq =  pci_irq_vector(pdev, CXL_EVENT_INT_MSGNUM(setting));
> +	if (irq < 0)
> +		return irq;
> +
> +	return devm_request_threaded_irq(dev, irq, NULL, cxl_event_thread,
> +					 IRQF_SHARED, NULL, dev_id);
> +}
> +
> +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_event_interrupt_policy policy;
> +
> +	if (cxl_event_config_msgnums(cxlds, &policy))
> +		return;

If not all interrupts can be steered to the OS probably best to abort
the entire event setup.

Otherwise if you can steer all to the OS, if any of the below fails that
should be a driver load failure. I certainly do not want to debug
someone's system that randomly failed alternating log type interrupts.

> +
> +	if (cxl_req_event_irq(cxlds, policy.info_settings))
> +		dev_err(cxlds->dev, "Failed to get interrupt for event Info log\n");
> +
> +	if (cxl_req_event_irq(cxlds, policy.warn_settings))
> +		dev_err(cxlds->dev, "Failed to get interrupt for event Warn log\n");
> +
> +	if (cxl_req_event_irq(cxlds, policy.failure_settings))
> +		dev_err(cxlds->dev, "Failed to get interrupt for event Failure log\n");
> +
> +	if (cxl_req_event_irq(cxlds, policy.fatal_settings))
> +		dev_err(cxlds->dev, "Failed to get interrupt for event Fatal log\n");
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> @@ -526,14 +610,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> +	cxl_alloc_irq_vectors(cxlds);

Just pass the pdev directly here, no other part of cxlds is needed.

> +
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
>  	if (host_bridge->native_cxl_error) {
>  		cxl_mem_alloc_event_buf(cxlds);
> -		if (cxlds->event.buf)
> +		if (cxlds->event.buf) {
> +			cxl_event_irqsetup(cxlds);
>  			cxl_clear_event_logs(cxlds);
> +		}
>  	}
>  
>  	if (cxlds->regs.ras) {
> -- 
> 2.37.2
>