netdev - Re: [PATCH net-next 11/12] bnxt_en: Expose threshold temperatures through hwmon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6f3a05e-f75c-4051-8892-1c2dee2804b0@roeck-us.net>
Date: Tue, 15 Aug 2023 08:05:00 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Michael Chan <michael.chan@...adcom.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com, gospo@...adcom.com,
	Kalesh AP <kalesh-anakkur.purayil@...adcom.com>,
	Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org
Subject: Re: [PATCH net-next 11/12] bnxt_en: Expose threshold temperatures
 through hwmon

On Mon, Aug 14, 2023 at 09:56:57PM -0700, Michael Chan wrote:
> From: Kalesh AP <kalesh-anakkur.purayil@...adcom.com>
> 
> HWRM_TEMP_MONITOR_QUERY response now indicates various
> threshold temperatures. Expose these threshold temperatures
> through the hwmon sysfs.
> Also, provide temp1_max_alarm through which the user can check
> whether the threshold temperature has been reached or not.
> 
> Example:
> cat /sys/class/hwmon/hwmon3/temp1_input
> 75000
> cat /sys/class/hwmon/hwmon3/temp1_max
> 105000
> cat /sys/class/hwmon/hwmon3/temp1_max_alarm
> 0
> 
> Cc: Jean Delvare <jdelvare@...e.com>
> Cc: Guenter Roeck <linux@...ck-us.net>
> Cc: linux-hwmon@...r.kernel.org
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@...adcom.com>
> Signed-off-by: Michael Chan <michael.chan@...adcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  7 ++
>  .../net/ethernet/broadcom/bnxt/bnxt_hwmon.c   | 71 +++++++++++++++++--
>  2 files changed, 73 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 84cbcfa61bc1..43a07d84f815 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -2013,6 +2013,7 @@ struct bnxt {
>  	#define BNXT_FW_CAP_RING_MONITOR		BIT_ULL(30)
>  	#define BNXT_FW_CAP_DBG_QCAPS			BIT_ULL(31)
>  	#define BNXT_FW_CAP_PTP				BIT_ULL(32)
> +	#define BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED	BIT_ULL(33)
>  
>  	u32			fw_dbg_cap;
>  
> @@ -2185,7 +2186,13 @@ struct bnxt {
>  	struct bnxt_tc_info	*tc_info;
>  	struct list_head	tc_indr_block_list;
>  	struct dentry		*debugfs_pdev;
> +#ifdef CONFIG_BNXT_HWMON
>  	struct device		*hwmon_dev;
> +	u8			warn_thresh_temp;
> +	u8			crit_thresh_temp;
> +	u8			fatal_thresh_temp;
> +	u8			shutdown_thresh_temp;
> +#endif
>  	enum board_idx		board_idx;
>  };
>  
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> index 20381b7b1d78..f5affac1169a 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> @@ -34,6 +34,15 @@ static int bnxt_hwrm_temp_query(struct bnxt *bp, u8 *temp)
>  
>  	if (temp)
>  		*temp = resp->temp;
> +
> +	if (resp->flags & TEMP_MONITOR_QUERY_RESP_FLAGS_THRESHOLD_VALUES_AVAILABLE) {
> +		if (!temp)
> +			bp->fw_cap |= BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED;

The if statement seems unnecessary. If the flag was not set
during initialization, the limit attributes won't be visible anyway,
so it doesn't make a difference if it is set now or not.

> +		bp->warn_thresh_temp = resp->warn_threshold;
> +		bp->crit_thresh_temp = resp->critical_threshold;
> +		bp->fatal_thresh_temp = resp->fatal_threshold;
> +		bp->shutdown_thresh_temp = resp->shutdown_threshold;

Are those temperatures expected to change during runtime ? If not it might
make sense to only execute the entire if condition if temp == NULL to
avoid unnecessary reassignments whenever the temperature is read.

> +	}
>  err:
>  	hwrm_req_drop(bp, req);
>  	return rc;
> @@ -42,12 +51,30 @@ static int bnxt_hwrm_temp_query(struct bnxt *bp, u8 *temp)
>  static umode_t bnxt_hwmon_is_visible(const void *_data, enum hwmon_sensor_types type,
>  				     u32 attr, int channel)
>  {
> +	const struct bnxt *bp = _data;
> +
>  	if (type != hwmon_temp)
>  		return 0;
>  
>  	switch (attr) {
>  	case hwmon_temp_input:
>  		return 0444;
> +	case hwmon_temp_lcrit:
> +	case hwmon_temp_crit:
> +	case hwmon_temp_emergency:
> +	case hwmon_temp_lcrit_alarm:
> +	case hwmon_temp_crit_alarm:
> +	case hwmon_temp_emergency_alarm:
> +		if (~bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED)

Seems to me that
		if (!(bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED))
would be much easier to understand.

> +			return 0;
> +		return 0444;
> +	/* Max temperature setting in NVM is optional */
> +	case hwmon_temp_max:
> +	case hwmon_temp_max_alarm:
> +		if (~bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED ||
> +		    !bp->shutdown_thresh_temp)
> +			return 0;

Wrong use of the 'max' attribute. More on that below.

> +		return 0444;
>  	default:
>  		return 0;
>  	}
> @@ -66,6 +93,38 @@ static int bnxt_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32
>  		if (!rc)
>  			*val = temp * 1000;
>  		return rc;
> +	case hwmon_temp_lcrit:
> +		*val = bp->warn_thresh_temp * 1000;
> +		return 0;
> +	case hwmon_temp_crit:
> +		*val = bp->crit_thresh_temp * 1000;
> +		return 0;
> +	case hwmon_temp_emergency:
> +		*val = bp->fatal_thresh_temp * 1000;
> +		return 0;
> +	case hwmon_temp_max:
> +		*val = bp->shutdown_thresh_temp * 1000;
> +		return 0;
> +	case hwmon_temp_lcrit_alarm:
> +		rc = bnxt_hwrm_temp_query(bp, &temp);
> +		if (!rc)
> +			*val = temp >= bp->warn_thresh_temp;

That is wrong. lcrit is the _lower_ critical temperature, ie the
temperature is critically low. This is not a "high temperature"
alarm.

> +		return rc;
> +	case hwmon_temp_crit_alarm:
> +		rc = bnxt_hwrm_temp_query(bp, &temp);
> +		if (!rc)
> +			*val = temp >= bp->crit_thresh_temp;
> +		return rc;
> +	case hwmon_temp_emergency_alarm:
> +		rc = bnxt_hwrm_temp_query(bp, &temp);
> +		if (!rc)
> +			*val = temp >= bp->fatal_thresh_temp;
> +		return rc;
> +	case hwmon_temp_max_alarm:
> +		rc = bnxt_hwrm_temp_query(bp, &temp);
> +		if (!rc)
> +			*val = temp >= bp->shutdown_thresh_temp;

Hmm, that isn't really the purpose of alarm attributes. The expectation
would be that the chip sets alarm flags and the driver reports it.
I guess there is some value in having it, so I won't object.

Anyway, the ordering is wrong. max_alarm should be the lowest
alarm level, followed by crit and emergency. So
		max_alarm -> temp >= bp->warn_thresh_temp
		crit_alarm -> temp >= bp->crit_thresh_temp
		emergency_alarm -> temp >= bp->fatal_thresh_temp
				or temp >= bp->shutdown_thresh_temp

There are only three levels of upper temperature alarms.
Abusing lcrit as 4th upper alarm is most definitely wrong.

> +		return rc;
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> @@ -73,7 +132,11 @@ static int bnxt_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32
>  
>  static const struct hwmon_channel_info *bnxt_hwmon_info[] = {
>  	HWMON_CHANNEL_INFO(temp,
> -			   HWMON_T_INPUT),
> +			   HWMON_T_INPUT |
> +			   HWMON_T_MAX | HWMON_T_LCRIT |
> +			   HWMON_T_CRIT | HWMON_T_EMERGENCY |
> +			   HWMON_T_CRIT_ALARM | HWMON_T_LCRIT_ALARM |
> +			   HWMON_T_MAX_ALARM | HWMON_T_EMERGENCY_ALARM),
>  	NULL
>  };
>  
> @@ -97,13 +160,11 @@ void bnxt_hwmon_uninit(struct bnxt *bp)
>  
>  void bnxt_hwmon_init(struct bnxt *bp)
>  {
> -	struct hwrm_temp_monitor_query_input *req;
>  	struct pci_dev *pdev = bp->pdev;
>  	int rc;
>  
> -	rc = hwrm_req_init(bp, req, HWRM_TEMP_MONITOR_QUERY);
> -	if (!rc)
> -		rc = hwrm_req_send_silent(bp, req);
> +	/* temp1_xxx is only sensor, ensure not registered if it will fail */
> +	rc = bnxt_hwrm_temp_query(bp, NULL);

Ah, that is the reason for the check in bnxt_hwrm_temp_query().
The check in that function should really be added here, not in the
previous patch.

>  	if (rc == -EACCES || rc == -EOPNOTSUPP) {
>  		bnxt_hwmon_uninit(bp);
>  		return;
> -- 
> 2.30.1
>