[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6f3a05e-f75c-4051-8892-1c2dee2804b0@roeck-us.net>
Date: Tue, 15 Aug 2023 08:05:00 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Michael Chan <michael.chan@...adcom.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, gospo@...adcom.com,
Kalesh AP <kalesh-anakkur.purayil@...adcom.com>,
Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org
Subject: Re: [PATCH net-next 11/12] bnxt_en: Expose threshold temperatures
through hwmon
On Mon, Aug 14, 2023 at 09:56:57PM -0700, Michael Chan wrote:
> From: Kalesh AP <kalesh-anakkur.purayil@...adcom.com>
>
> HWRM_TEMP_MONITOR_QUERY response now indicates various
> threshold temperatures. Expose these threshold temperatures
> through the hwmon sysfs.
> Also, provide temp1_max_alarm through which the user can check
> whether the threshold temperature has been reached or not.
>
> Example:
> cat /sys/class/hwmon/hwmon3/temp1_input
> 75000
> cat /sys/class/hwmon/hwmon3/temp1_max
> 105000
> cat /sys/class/hwmon/hwmon3/temp1_max_alarm
> 0
>
> Cc: Jean Delvare <jdelvare@...e.com>
> Cc: Guenter Roeck <linux@...ck-us.net>
> Cc: linux-hwmon@...r.kernel.org
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@...adcom.com>
> Signed-off-by: Michael Chan <michael.chan@...adcom.com>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 ++
> .../net/ethernet/broadcom/bnxt/bnxt_hwmon.c | 71 +++++++++++++++++--
> 2 files changed, 73 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 84cbcfa61bc1..43a07d84f815 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -2013,6 +2013,7 @@ struct bnxt {
> #define BNXT_FW_CAP_RING_MONITOR BIT_ULL(30)
> #define BNXT_FW_CAP_DBG_QCAPS BIT_ULL(31)
> #define BNXT_FW_CAP_PTP BIT_ULL(32)
> + #define BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED BIT_ULL(33)
>
> u32 fw_dbg_cap;
>
> @@ -2185,7 +2186,13 @@ struct bnxt {
> struct bnxt_tc_info *tc_info;
> struct list_head tc_indr_block_list;
> struct dentry *debugfs_pdev;
> +#ifdef CONFIG_BNXT_HWMON
> struct device *hwmon_dev;
> + u8 warn_thresh_temp;
> + u8 crit_thresh_temp;
> + u8 fatal_thresh_temp;
> + u8 shutdown_thresh_temp;
> +#endif
> enum board_idx board_idx;
> };
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> index 20381b7b1d78..f5affac1169a 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwmon.c
> @@ -34,6 +34,15 @@ static int bnxt_hwrm_temp_query(struct bnxt *bp, u8 *temp)
>
> if (temp)
> *temp = resp->temp;
> +
> + if (resp->flags & TEMP_MONITOR_QUERY_RESP_FLAGS_THRESHOLD_VALUES_AVAILABLE) {
> + if (!temp)
> + bp->fw_cap |= BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED;
The if statement seems unnecessary. If the flag was not set
during initialization, the limit attributes won't be visible anyway,
so it doesn't make a difference if it is set now or not.
> + bp->warn_thresh_temp = resp->warn_threshold;
> + bp->crit_thresh_temp = resp->critical_threshold;
> + bp->fatal_thresh_temp = resp->fatal_threshold;
> + bp->shutdown_thresh_temp = resp->shutdown_threshold;
Are those temperatures expected to change during runtime ? If not it might
make sense to only execute the entire if condition if temp == NULL to
avoid unnecessary reassignments whenever the temperature is read.
> + }
> err:
> hwrm_req_drop(bp, req);
> return rc;
> @@ -42,12 +51,30 @@ static int bnxt_hwrm_temp_query(struct bnxt *bp, u8 *temp)
> static umode_t bnxt_hwmon_is_visible(const void *_data, enum hwmon_sensor_types type,
> u32 attr, int channel)
> {
> + const struct bnxt *bp = _data;
> +
> if (type != hwmon_temp)
> return 0;
>
> switch (attr) {
> case hwmon_temp_input:
> return 0444;
> + case hwmon_temp_lcrit:
> + case hwmon_temp_crit:
> + case hwmon_temp_emergency:
> + case hwmon_temp_lcrit_alarm:
> + case hwmon_temp_crit_alarm:
> + case hwmon_temp_emergency_alarm:
> + if (~bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED)
Seems to me that
if (!(bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED))
would be much easier to understand.
> + return 0;
> + return 0444;
> + /* Max temperature setting in NVM is optional */
> + case hwmon_temp_max:
> + case hwmon_temp_max_alarm:
> + if (~bp->fw_cap & BNXT_FW_CAP_THRESHOLD_TEMP_SUPPORTED ||
> + !bp->shutdown_thresh_temp)
> + return 0;
Wrong use of the 'max' attribute. More on that below.
> + return 0444;
> default:
> return 0;
> }
> @@ -66,6 +93,38 @@ static int bnxt_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32
> if (!rc)
> *val = temp * 1000;
> return rc;
> + case hwmon_temp_lcrit:
> + *val = bp->warn_thresh_temp * 1000;
> + return 0;
> + case hwmon_temp_crit:
> + *val = bp->crit_thresh_temp * 1000;
> + return 0;
> + case hwmon_temp_emergency:
> + *val = bp->fatal_thresh_temp * 1000;
> + return 0;
> + case hwmon_temp_max:
> + *val = bp->shutdown_thresh_temp * 1000;
> + return 0;
> + case hwmon_temp_lcrit_alarm:
> + rc = bnxt_hwrm_temp_query(bp, &temp);
> + if (!rc)
> + *val = temp >= bp->warn_thresh_temp;
That is wrong. lcrit is the _lower_ critical temperature, ie the
temperature is critically low. This is not a "high temperature"
alarm.
> + return rc;
> + case hwmon_temp_crit_alarm:
> + rc = bnxt_hwrm_temp_query(bp, &temp);
> + if (!rc)
> + *val = temp >= bp->crit_thresh_temp;
> + return rc;
> + case hwmon_temp_emergency_alarm:
> + rc = bnxt_hwrm_temp_query(bp, &temp);
> + if (!rc)
> + *val = temp >= bp->fatal_thresh_temp;
> + return rc;
> + case hwmon_temp_max_alarm:
> + rc = bnxt_hwrm_temp_query(bp, &temp);
> + if (!rc)
> + *val = temp >= bp->shutdown_thresh_temp;
Hmm, that isn't really the purpose of alarm attributes. The expectation
would be that the chip sets alarm flags and the driver reports it.
I guess there is some value in having it, so I won't object.
Anyway, the ordering is wrong. max_alarm should be the lowest
alarm level, followed by crit and emergency. So
max_alarm -> temp >= bp->warn_thresh_temp
crit_alarm -> temp >= bp->crit_thresh_temp
emergency_alarm -> temp >= bp->fatal_thresh_temp
or temp >= bp->shutdown_thresh_temp
There are only three levels of upper temperature alarms.
Abusing lcrit as 4th upper alarm is most definitely wrong.
> + return rc;
> default:
> return -EOPNOTSUPP;
> }
> @@ -73,7 +132,11 @@ static int bnxt_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32
>
> static const struct hwmon_channel_info *bnxt_hwmon_info[] = {
> HWMON_CHANNEL_INFO(temp,
> - HWMON_T_INPUT),
> + HWMON_T_INPUT |
> + HWMON_T_MAX | HWMON_T_LCRIT |
> + HWMON_T_CRIT | HWMON_T_EMERGENCY |
> + HWMON_T_CRIT_ALARM | HWMON_T_LCRIT_ALARM |
> + HWMON_T_MAX_ALARM | HWMON_T_EMERGENCY_ALARM),
> NULL
> };
>
> @@ -97,13 +160,11 @@ void bnxt_hwmon_uninit(struct bnxt *bp)
>
> void bnxt_hwmon_init(struct bnxt *bp)
> {
> - struct hwrm_temp_monitor_query_input *req;
> struct pci_dev *pdev = bp->pdev;
> int rc;
>
> - rc = hwrm_req_init(bp, req, HWRM_TEMP_MONITOR_QUERY);
> - if (!rc)
> - rc = hwrm_req_send_silent(bp, req);
> + /* temp1_xxx is only sensor, ensure not registered if it will fail */
> + rc = bnxt_hwrm_temp_query(bp, NULL);
Ah, that is the reason for the check in bnxt_hwrm_temp_query().
The check in that function should really be added here, not in the
previous patch.
> if (rc == -EACCES || rc == -EOPNOTSUPP) {
> bnxt_hwmon_uninit(bp);
> return;
> --
> 2.30.1
>
Powered by blists - more mailing lists