lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQMsVUCBDF7ZUSK-@wunner.de>
Date: Thu, 30 Oct 2025 10:13:57 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Tony Hutter <hutter2@...l.gov>
Cc: Bjorn Helgaas <helgaas@...nel.org>, corey@...yard.net,
	alok.a.tiwari@...cle.com, mariusz.tkaczyk@...ux.intel.com,
	minyard@....org, linux-pci@...r.kernel.org,
	openipmi-developer@...ts.sourceforge.net,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6] Introduce Cray ClusterStor E1000 NVMe slot LED driver

On Wed, Oct 08, 2025 at 04:48:22PM -0700, Tony Hutter wrote:
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -231,6 +231,27 @@ Description:
>  		    - scXX contains the device subclass;
>  		    - iXX contains the device class programming interface.
>  
> +What:		/sys/bus/pci/slots/.../attention
> +Date:		February 2025

Please update the "Date" timestamp to the month of submission to the list.

> +Contact:	linux-pci@...r.kernel.org
> +Description:
> +		The attention attribute is used to read or write the attention
> +		status for an enclosure slot.  This is often used to set the
> +		slot LED value on a NVMe storage enclosure.
> +
> +		Common values:
> +		0 = OFF
> +		1 = ON
> +		2 = blink (ampere, ibmphp, pciehp, rpaphp, shpchp)
> +
> +		Using the pciehp_craye1k extensions:
> +		0 = fault LED OFF, locate LED OFF
> +		1 = fault LED ON,  locate LED OFF
> +		2 = fault LED OFF, locate LED ON
> +		3 = fault LED ON,  locate LED ON

An end user might not be able to understand all these driver names,
so I'd omit "(ampere, ibmphp, pciehp, rpaphp, shpchp)",
and replace "pciehp_craye1k" with "Cray ClusterStor E1000".

> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -73,6 +73,13 @@ static int init_slot(struct controller *ctrl)
>  		ops->get_attention_status = pciehp_get_raw_indicator_status;
>  		ops->set_attention_status = pciehp_set_raw_indicator_status;
>  	}
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE_CRAY_E1000
> +	if (is_craye1k_slot(ctrl)) {
> +		/* slots 1-24 on Cray E1000s are controlled differently */
> +		ops->get_attention_status = craye1k_get_attention_status;
> +		ops->set_attention_status = craye1k_set_attention_status;
> +	}
> +#endif

Per section 21 of Documentation/process/coding-style.rst,
please avoid the use of #ifdef and instead add an #else section
to pciehp.h with an inline stub for is_craye1k_slot() which returns false
and #define craye1k_{get,set}_attention_status to NULL.

Do these hotplug slots on the Cray E1000 have the Attention Indicator
Present bit set in the Slot Capabilities register?  If they do not,
please use "if else" instead of "if".  Right now you're overwriting
the default {get,set}_attention_status pointers, which only makes
sense if ATTN_LED(ctrl) is true.

When is craye1k_new_smi() called?  Is it guaranteed to be called
before the first invocation of craye1k_{get,set}_attention_status()?

I'm asking because craye1k_{get,set}_attention_status() do not
contain any checks whether the craye1k_global struct has been
populated.  You would need such checks if craye1k_new_smi() may
run later than the "attention" attribute appearing in sysfs.

Actually craye1k_new_smi() may fail, e.g. because of kzalloc()
returning NULL, so I think you'll need additional checks in any case.
However if craye1k_new_smi() is guaranteed to run before init_slot(),
you could check for an uninitialized craye1k_global struct already in
is_craye1k_slot().

Note that just checking "craye1k_global == NULL" is insufficient as
craye1k_new_smi() might run concurrently to
craye1k_{get,set}_attention_status().

I note that is_craye1k_slot() calls dmi_match(), so you should add
"depends on DMI" to the Kconfig entry you're adding.

> @@ -376,8 +383,16 @@ int __init pcie_hp_init(void)
>  
>  	retval = pcie_port_service_register(&hpdriver_portdrv);
>  	pr_debug("pcie_port_service_register = %d\n", retval);
> -	if (retval)
> +	if (retval) {
>  		pr_debug("Failure to register service\n");
> +		return retval;
> +	}
> +
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE_CRAY_E1000
> +	retval = craye1k_init();
> +	if (retval)
> +		pr_debug("Failure to register Cray E1000 extensions");
> +#endif

Another #ifdef that should be eliminated.

You also need to annotate craye1k_init() with __init.

> +++ b/drivers/pci/hotplug/pciehp_craye1k.c
[...]
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/errno.h>
> +#include <linux/dmi.h>
> +#include <linux/ipmi.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +#include <linux/random.h>
> +#include "pciehp.h"

dmi.h isn't inserted in alphabetical order.

> +	/* debugfs stats */
> +	u64 check_primary;
> +	u64 check_primary_failed;
> +	u64 was_already_primary;
> +	u64 was_not_already_primary;
> +	u64 set_primary;
> +	u64 set_initial_primary_failed;
> +	u64 set_primary_failed;
> +	u64 set_led_locate_failed;
> +	u64 set_led_fault_failed;
> +	u64 set_led_readback_failed;
> +	u64 set_led_failed;
> +	u64 get_led_failed;
> +	u64 completion_timeout;
> +	u64 wrong_msgid;
> +	u64 request_failed;

I'm wondering if having all this debug code in mainline is useful?
Is the IPMI interface this brittle?  Was this just necessary
for initial development and may not be useful going forward?

> +static void craye1k_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> +{
> +	struct craye1k *craye1k = user_msg_data;
> +
> +	if (msg->msgid != craye1k->tx_msg_id) {
> +		craye1k->wrong_msgid++;
> +		if (craye1k->print_errors) {
> +			dev_warn_ratelimited(craye1k->dev, "rx msgid %d != %d",
> +					     (int)msg->msgid,
> +					     (int)craye1k->tx_msg_id);

Why use casts instead of %ld in the format string?

> +static void craye1k_smi_gone(int iface)
> +{
> +	pr_warn("craye1k: Got unexpected smi_gone, iface=%d", iface);
> +}

Why not kfree() the craye1k struct here, tear down debugfs etc?

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ