[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c71f6da-ca62-40e6-b046-b23083b80ae0@molgen.mpg.de>
Date: Wed, 4 Dec 2024 13:54:47 +0100
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Konrad Knitter <konrad.knitter@...el.com>
Cc: intel-wired-lan@...ts.osuosl.org, anthony.l.nguyen@...el.com,
przemyslaw.kitszel@...el.com, netdev@...r.kernel.org, kuba@...nel.org,
pabeni@...hat.com, edumazet@...gle.com, davem@...emloft.net,
andrew+netdev@...n.ch, Sharon Haroni <sharon.haroni@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: fw and port health
status
[Cc: -Brett, -Nicholas (550 #5.1.0 Address rejected.)]
Am 04.12.24 um 13:34 schrieb Paul Menzel:
> Dear Konrad,
>
>
> Thank you for your patch. It’d be great if you made the commit message
> summary/title a statement by adding a verb (in imperative mood). Maybe:
>
> ice: Support for fw and port health status
>
>
> Am 04.12.24 um 13:27 schrieb Konrad Knitter:
>> Firmware generates events for global events or port specific events.
>>
>> Driver shall subscribe for health status events from firmware on supported
>> FW versions >= 1.7.6.
>
> Please add a blank line between paragraphs, or do not break the line
> just because a new sentence starts.
>
>> Driver shall expose those under specific health reporter, two new
>> reporters are introduced:
>> - FW health reporter shall represent global events (problems with the
>> image, recovery mode);
>> - Port health reporter shall represent port-specific events (module
>> failure).
>>
>> Firmware only reports problems when those are detected, it does not store
>> active fault list.
>> Driver will hold only last global and last port-specific event.
>> Driver will report all events via devlink health report,
>> so in case of multiple events of the same source they can be reviewed
>> using devlink autodump feature.
>>
>> $ devlink health
>>
>> pci/0000:b1:00.3:
>> reporter fw
>> state healthy error 0 recover 0 auto_dump true
>> reporter port
>> state error error 1 recover 0 last_dump_date 2024-03-17
>> last_dump_time 09:29:29 auto_dump true
>>
>> $ devlink health diagnose pci/0000:b1:00.3 reporter port
>>
>> Syndrome: 262
>> Description: Module is not present.
>> Possible Solution: Check that the module is inserted correctly.
>> Port Number: 0
>>
>> Tested on Intel Corporation Ethernet Controller E810-C for SFP
>
> Thank you for adding the above information.
>
>> Co-developed-by: Sharon Haroni <sharon.haroni@...el.com>
>> Signed-off-by: Sharon Haroni <sharon.haroni@...el.com>
>> Co-developed-by: Nicholas Nunley <nicholas.d.nunley@...el.com>
>> Signed-off-by: Nicholas Nunley <nicholas.d.nunley@...el.com>
>> Co-developed-by: Brett Creeley <brett.creeley@...el.com>
>> Signed-off-by: Brett Creeley <brett.creeley@...el.com>
>> Signed-off-by: Konrad Knitter <konrad.knitter@...el.com>
>> ---
>> v2:
>> - Removal of __VA_OPS__ usage. Style fixes.
>> Depends-on: https://lore.kernel.org/netdev/20240930133724.610512-1-
>> przemyslaw.kitszel@...el.com/T/
>> ---
>> .../net/ethernet/intel/ice/devlink/health.c | 253 +++++++++++++++++-
>> .../net/ethernet/intel/ice/devlink/health.h | 14 +-
>> .../net/ethernet/intel/ice/ice_adminq_cmd.h | 87 ++++++
>> drivers/net/ethernet/intel/ice/ice_common.c | 38 +++
>> drivers/net/ethernet/intel/ice/ice_common.h | 2 +
>> drivers/net/ethernet/intel/ice/ice_main.c | 3 +
>> drivers/net/ethernet/intel/ice/ice_type.h | 5 +
>> 7 files changed, 400 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c b/drivers/net/ethernet/intel/ice/devlink/health.c
>> index c7a8b8c9e1ca..c5a16879c916 100644
>> --- a/drivers/net/ethernet/intel/ice/devlink/health.c
>> +++ b/drivers/net/ethernet/intel/ice/devlink/health.c
>> @@ -1,13 +1,251 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /* Copyright (c) 2024, Intel Corporation. */
>> -#include "health.h"
>> #include "ice.h"
>> +#include "ice_adminq_cmd.h" /* for enum ice_aqc_health_status_elem */
>> +#include "health.h"
>> #include "ice_ethtool_common.h"
>> #define ICE_DEVLINK_FMSG_PUT_FIELD(fmsg, obj, name) \
>> devlink_fmsg_put(fmsg, #name, (obj)->name)
>> +#define ICE_HEALTH_STATUS_DATA_SIZE 2
>> +
>> +struct ice_health_status {
>> + enum ice_aqc_health_status code;
>> + const char *description;
>> + const char *solution;
>> + const char *data_label[ICE_HEALTH_STATUS_DATA_SIZE];
>> +};
>> +
>> +/*
>> + * In addition to the health status codes provided below, the firmware might
>> + * generate Health Status Codes that are not pertinent to the end-user.
>> + * For instance, Health Code 0x1002 is triggered when the command fails.
>> + * Such codes should be disregarded by the end-user.
>> + * The below lookup requires to be sorted by code.
>> + */
>> +
>> +static const char *const ice_common_port_solutions =
>> + "Check your cable connection. Change or replace the module or cable. Manually set speed and duplex.";
>> +static const char *const ice_port_number_label = "Port Number";
>> +static const char *const ice_update_nvm_solution = "Update to the latest NVM image.";
>> +
>> +static const struct ice_health_status ice_health_status_lookup[] = {
>> + {ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT, "An unsupported module was detected",
>> + ice_common_port_solutions, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_MOD_TYPE, "Module type is not supported.",
>> + "Change or replace the module or cable.", {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_MOD_QUAL, "Module is not qualified.",
>> + ice_common_port_solutions, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_MOD_COMM,
>> + "Device cannot communicate with the module.",
>> + "Check your cable connection. Change or replace the module or cable. Manually set speed and duplex.",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_MOD_CONFLICT, "Unresolved module conflict.",
>> + "Manually set speed/duplex or change the port option. If the problem persists, use a cable/module that is found in the supported modules and cables list for this device.",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_MOD_NOT_PRESENT, "Module is not present.",
>> + "Check that the module is inserted correctly. If the problem persists, use a cable/module that is found in the supported modules and cables list for this device.",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_INFO_MOD_UNDERUTILIZED, "Underutilized module.",
>> + "Change or replace the module or cable. Change the port option",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_LENIENT, "An unsupported module was detected",
>> + ice_common_port_solutions, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_INVALID_LINK_CFG, "Invalid link configuration.",
>> + NULL, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_PORT_ACCESS, "Port hardware access error.",
>
> Sometimes there are dots/periods at the end, and sometimes there are
> none. It’d be great if it were consistent.
>
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_PORT_UNREACHABLE, "A port is unreachable.",
>> + "Change the port option. Update to the latest NVM image."},
>> + {ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_MOD_LIMITED, "Port speed is limited due to module.",
>> + "Change the module or configure the port option to match the current module speed. Change the port option.",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_PARALLEL_FAULT,
>> + "All configured link modes were attempted but failed to establish link. The device will restart the process to establish link.",
>> + "Check link partner connection and configuration.",
>> + {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_PHY_LIMITED,
>> + "Port speed is limited by PHY capabilities.",
>> + "Change the module to align to port option.", {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_NETLIST_TOPO, "LOM topology netlist is corrupted.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_NETLIST, "Unrecoverable netlist error.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_TOPO_CONFLICT, "Port topology conflict.",
>> + "Change the port option. Update to the latest NVM image."},
>> + {ICE_AQC_HEALTH_STATUS_ERR_LINK_HW_ACCESS, "Unrecoverable hardware access error.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_LINK_RUNTIME, "Unrecoverable runtime error.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_DNL_INIT, "Link management engine failed to initialize.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_PHY_FW_LOAD,
>> + "Failed to load the firmware image in the external PHY.",
>> + ice_update_nvm_solution, {ice_port_number_label}},
>> + {ICE_AQC_HEALTH_STATUS_INFO_RECOVERY, "The device is in firmware recovery mode.",
>> + ice_update_nvm_solution, {"Extended Error"}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_FLASH_ACCESS, "The flash chip cannot be accessed.",
>> + "If issue persists, call customer support.", {"Access Type"}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_NVM_AUTH, "NVM authentication failed.",
>> + ice_update_nvm_solution},
>> + {ICE_AQC_HEALTH_STATUS_ERR_OROM_AUTH, "Option ROM authentication failed",
>> + ice_update_nvm_solution},
>> + {ICE_AQC_HEALTH_STATUS_ERR_DDP_AUTH, "DDP package authentication failed.",
>> + "Update to latest base driver and DDP package."},
>> + {ICE_AQC_HEALTH_STATUS_ERR_NVM_COMPAT, "NVM image is incompatible.",
>> + ice_update_nvm_solution},
>> + {ICE_AQC_HEALTH_STATUS_ERR_OROM_COMPAT, "Option ROM is incompatible.",
>> + ice_update_nvm_solution, {"Expected PCI Device ID", "Expected Module ID"}},
>> + {ICE_AQC_HEALTH_STATUS_ERR_DCB_MIB,
>> + "Supplied MIB file is invalid. DCB reverted to default configuration.",
>> + "Disable FW-LLDP and check DCBx system configuration.",
>> + {ice_port_number_label, "MIB ID"}},
>> +};
>> +
>> +static int ice_health_status_lookup_compare(const void *a, const void
>> *b)
>> +{
>> + return ((struct ice_health_status *)a)->code - ((struct ice_health_status *)b)->code;
>> +}
>> +
>> +static const struct ice_health_status *ice_get_health_status(u16 code)
>> +{
>> + struct ice_health_status key = { .code = code };
>> +
>> + return bsearch(&key, ice_health_status_lookup, ARRAY_SIZE(ice_health_status_lookup),
>> + sizeof(struct ice_health_status), ice_health_status_lookup_compare);
>> +}
>> +
>> +static void ice_describe_status_code(struct devlink_fmsg *fmsg,
>> + struct ice_aqc_health_status_elem *hse)
>> +{
>> + static const char *const aux_label[] = { "Aux Data 1", "Aux Data 2" };
>> + const struct ice_health_status *health_code;
>> + u32 internal_data[2];
>> + u16 status_code;
>> +
>> + status_code = le16_to_cpu(hse->health_status_code);
>> +
>> + devlink_fmsg_put(fmsg, "Syndrome", status_code);
>> + if (status_code) {
>> + internal_data[0] = le32_to_cpu(hse->internal_data1);
>> + internal_data[1] = le32_to_cpu(hse->internal_data2);
>> +
>> + health_code = ice_get_health_status(status_code);
>> + if (!health_code)
>> + return;
>> +
>> + devlink_fmsg_string_pair_put(fmsg, "Description", health_code->description);
>> + if (health_code->solution)
>> + devlink_fmsg_string_pair_put(fmsg, "Possible Solution",
>> + health_code->solution);
>> +
>> + for (int i = 0; i < ICE_HEALTH_STATUS_DATA_SIZE; i++) {
>
> Use size_t?
>
>> + if (internal_data[i] !=
>> ICE_AQC_HEALTH_STATUS_UNDEFINED_DATA)
>> + devlink_fmsg_u32_pair_put(fmsg,
>> + health_code->data_label[i] ?
>> + health_code->data_label[i] :
>> + aux_label[i],
>> + internal_data[i]);
>> + }
>> + }
>> +}
>> +
>> +static int
>> +ice_port_reporter_dump(struct devlink_health_reporter *reporter, struct devlink_fmsg *fmsg,
>> + void *priv_ctx, struct netlink_ext_ack __always_unused *extack)
>> +{
>> + struct ice_pf *pf = devlink_health_reporter_priv(reporter);
>> +
>> + ice_describe_status_code(fmsg, &pf->health_reporters.port_status);
>> + return 0;
>> +}
>> +
>> +static int
>> +ice_fw_reporter_dump(struct devlink_health_reporter *reporter, struct devlink_fmsg *fmsg,
>> + void *priv_ctx, struct netlink_ext_ack *extack)
>> +{
>> + struct ice_pf *pf = devlink_health_reporter_priv(reporter);
>> +
>> + ice_describe_status_code(fmsg, &pf->health_reporters.fw_status);
>> + return 0;
>> +}
>> +
>> +static void ice_config_health_events(struct ice_pf *pf, bool enable)
>> +{
>> + u8 enable_bits = 0;
>> + int ret;
>> +
>> + if (enable)
>> + enable_bits = ICE_AQC_HEALTH_STATUS_SET_PF_SPECIFIC_MASK |
>> + ICE_AQC_HEALTH_STATUS_SET_GLOBAL_MASK;
>> +
>> + ret = ice_aq_set_health_status_cfg(&pf->hw, enable_bits);
>> + if (ret)
>> + dev_err(ice_pf_to_dev(pf), "Failed to %s firmware health events, err %d aq_err %s\n",
>> + str_enable_disable(enable), ret,
>> + ice_aq_str(pf->hw.adminq.sq_last_status));
>> +}
>> +
>> +/**
>> + * ice_process_health_status_event - Process the health status event from FW
>> + * @pf: pointer to the PF structure
>> + * @event: event structure containing the Health Status Event opcode
>> + *
>> + * Decode the Health Status Events and print the associated messages
>> + */
>> +void ice_process_health_status_event(struct ice_pf *pf, struct ice_rq_event_info *event)
>> +{
>> + const struct ice_aqc_health_status_elem *health_info;
>> + u16 count;
>
> Why fix the length?
>
>> +
>> + health_info = (struct ice_aqc_health_status_elem *)event->msg_buf;
>> + count = le16_to_cpu(event->desc.params.get_health_status.health_status_count);
>> +
>> + if (count > (event->buf_len / sizeof(*health_info))) {
>> + dev_err(ice_pf_to_dev(pf), "Received a health status event with invalid element count\n");
>> + return;
>> + }
>> +
>> + for (int i = 0; i < count; i++) {
>> + const struct ice_health_status *health_code;
>> + u16 status_code;
>> +
>> + status_code = le16_to_cpu(health_info->health_status_code);
>> + health_code = ice_get_health_status(status_code);
>> +
>> + if (health_code) {
>> + switch (health_info->event_source) {
>> + case ICE_AQC_HEALTH_STATUS_GLOBAL:
>> + pf->health_reporters.fw_status = *health_info;
>> + devlink_health_report(pf->health_reporters.fw,
>> + "FW syndrome reported", NULL);
>> + break;
>> + case ICE_AQC_HEALTH_STATUS_PF:
>> + case ICE_AQC_HEALTH_STATUS_PORT:
>> + pf->health_reporters.port_status = *health_info;
>> + devlink_health_report(pf->health_reporters.port,
>> + "Port syndrome reported", NULL);
>> + break;
>> + default:
>> + dev_err(ice_pf_to_dev(pf), "Health code with unknown source\n");
>> + }
>> + } else {
>> + u32 data1, data2;
>> + u16 source;
>> +
>> + source = le16_to_cpu(health_info->event_source);
>> + data1 = le32_to_cpu(health_info->internal_data1);
>> + data2 = le32_to_cpu(health_info->internal_data2);
>> + dev_dbg(ice_pf_to_dev(pf),
>> + "Received internal health status code 0x%08x, source: 0x%08x, data1: 0x%08x, data2: 0x%08x",
>> + status_code, source, data1, data2);
>> + }
>> + health_info++;
>> + }
>> +}
>> +
>> /**
>> * ice_devlink_health_report - boilerplate to call given @reporter
>> *
>> @@ -244,6 +482,8 @@ ice_init_devlink_rep(struct ice_pf *pf,
>> ICE_DEFINE_HEALTH_REPORTER_OPS(mdd);
>> ICE_DEFINE_HEALTH_REPORTER_OPS(tx_hang);
>> +ICE_DEFINE_HEALTH_REPORTER_OPS(fw);
>> +ICE_DEFINE_HEALTH_REPORTER_OPS(port);
>> /**
>> * ice_health_init - allocate and init all ice devlink health
>> reporters and
>> @@ -257,6 +497,12 @@ void ice_health_init(struct ice_pf *pf)
>> reps->mdd = ice_init_devlink_rep(pf, &ice_mdd_reporter_ops);
>> reps->tx_hang = ice_init_devlink_rep(pf, &ice_tx_hang_reporter_ops);
>> +
>> + if (ice_is_fw_health_report_supported(&pf->hw)) {
>> + reps->fw = ice_init_devlink_rep(pf, &ice_fw_reporter_ops);
>> + reps->port = ice_init_devlink_rep(pf, &ice_port_reporter_ops);
>> + ice_config_health_events(pf, true);
>> + }
>> }
>> /**
>> @@ -279,6 +525,11 @@ void ice_health_deinit(struct ice_pf *pf)
>> {
>> ice_deinit_devl_reporter(pf->health_reporters.mdd);
>> ice_deinit_devl_reporter(pf->health_reporters.tx_hang);
>> + if (ice_is_fw_health_report_supported(&pf->hw)) {
>> + ice_deinit_devl_reporter(pf->health_reporters.fw);
>> + ice_deinit_devl_reporter(pf->health_reporters.port);
>> + ice_config_health_events(pf, false);
>> + }
>> }
>> static
>> diff --git a/drivers/net/ethernet/intel/ice/devlink/health.h b/drivers/net/ethernet/intel/ice/devlink/health.h
>> index a08c7bd174cf..280c429feec8 100644
>> --- a/drivers/net/ethernet/intel/ice/devlink/health.h
>> +++ b/drivers/net/ethernet/intel/ice/devlink/health.h
>> @@ -13,8 +13,10 @@
>> * devlink health mechanism for ice driver.
>> */
>> +struct ice_aqc_health_status_elem;
>> struct ice_pf;
>> struct ice_tx_ring;
>> +struct ice_rq_event_info;
>> enum ice_mdd_src {
>> ICE_MDD_SRC_TX_PQM,
>> @@ -25,17 +27,23 @@ enum ice_mdd_src {
>> /**
>> * struct ice_health - stores ice devlink health reporters and
>> accompanied data
>> - * @tx_hang: devlink health reporter for tx_hang event
>> + * @fw: devlink health reporter for FW Health Status events
>> * @mdd: devlink health reporter for MDD detection event
>> + * @port: devlink health reporter for Port Health Status events
>> + * @tx_hang: devlink health reporter for tx_hang event
>> * @tx_hang_buf: pre-allocated place to put info for Tx hang reporter from
>> * non-sleeping context
>> * @tx_ring: ring that the hang occured on
>> * @head: descriptior head
>> * @intr: interrupt register value
>> * @vsi_num: VSI owning the queue that the hang occured on
>> + * @fw_status: buffer for last received FW Status event
>> + * @port_status: buffer for last received Port Status event
>> */
>> struct ice_health {
>> + struct devlink_health_reporter *fw;
>> struct devlink_health_reporter *mdd;
>> + struct devlink_health_reporter *port;
>> struct devlink_health_reporter *tx_hang;
>> struct_group_tagged(ice_health_tx_hang_buf, tx_hang_buf,
>> struct ice_tx_ring *tx_ring;
>> @@ -43,8 +51,12 @@ struct ice_health {
>> u32 intr;
>> u16 vsi_num;
>> );
>> + struct ice_aqc_health_status_elem fw_status;
>> + struct ice_aqc_health_status_elem port_status;
>> };
>> +void ice_process_health_status_event(struct ice_pf *pf,
>> + struct ice_rq_event_info *event);
>> void ice_health_init(struct ice_pf *pf);
>> void ice_health_deinit(struct ice_pf *pf);
>> diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
>> index ce590991de38..232a1facf397 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
>> +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
>> @@ -2511,6 +2511,87 @@ enum ice_aqc_fw_logging_mod {
>> ICE_AQC_FW_LOG_ID_MAX,
>> };
>> +enum ice_aqc_health_status_mask {
>> + ICE_AQC_HEALTH_STATUS_SET_PF_SPECIFIC_MASK = BIT(0),
>> + ICE_AQC_HEALTH_STATUS_SET_ALL_PF_MASK = BIT(1),
>> + ICE_AQC_HEALTH_STATUS_SET_GLOBAL_MASK = BIT(2),
>> +};
>> +
>> +/* Set Health Status (direct 0xFF20) */
>> +struct ice_aqc_set_health_status_cfg {
>> + u8 event_source;
>> + u8 reserved[15];
>> +};
>> +
>> +enum ice_aqc_health_status {
>> + ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT = 0x101,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_TYPE = 0x102,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_QUAL = 0x103,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_COMM = 0x104,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_CONFLICT = 0x105,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_NOT_PRESENT = 0x106,
>> + ICE_AQC_HEALTH_STATUS_INFO_MOD_UNDERUTILIZED = 0x107,
>> + ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_LENIENT = 0x108,
>> + ICE_AQC_HEALTH_STATUS_ERR_MOD_DIAGNOSTIC_FEATURE = 0x109,
>> + ICE_AQC_HEALTH_STATUS_ERR_INVALID_LINK_CFG = 0x10B,
>> + ICE_AQC_HEALTH_STATUS_ERR_PORT_ACCESS = 0x10C,
>> + ICE_AQC_HEALTH_STATUS_ERR_PORT_UNREACHABLE = 0x10D,
>> + ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_MOD_LIMITED = 0x10F,
>> + ICE_AQC_HEALTH_STATUS_ERR_PARALLEL_FAULT = 0x110,
>> + ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_PHY_LIMITED = 0x111,
>> + ICE_AQC_HEALTH_STATUS_ERR_NETLIST_TOPO = 0x112,
>> + ICE_AQC_HEALTH_STATUS_ERR_NETLIST = 0x113,
>> + ICE_AQC_HEALTH_STATUS_ERR_TOPO_CONFLICT = 0x114,
>> + ICE_AQC_HEALTH_STATUS_ERR_LINK_HW_ACCESS = 0x115,
>> + ICE_AQC_HEALTH_STATUS_ERR_LINK_RUNTIME = 0x116,
>> + ICE_AQC_HEALTH_STATUS_ERR_DNL_INIT = 0x117,
>> + ICE_AQC_HEALTH_STATUS_ERR_PHY_NVM_PROG = 0x120,
>> + ICE_AQC_HEALTH_STATUS_ERR_PHY_FW_LOAD = 0x121,
>> + ICE_AQC_HEALTH_STATUS_INFO_RECOVERY = 0x500,
>> + ICE_AQC_HEALTH_STATUS_ERR_FLASH_ACCESS = 0x501,
>> + ICE_AQC_HEALTH_STATUS_ERR_NVM_AUTH = 0x502,
>> + ICE_AQC_HEALTH_STATUS_ERR_OROM_AUTH = 0x503,
>> + ICE_AQC_HEALTH_STATUS_ERR_DDP_AUTH = 0x504,
>> + ICE_AQC_HEALTH_STATUS_ERR_NVM_COMPAT = 0x505,
>> + ICE_AQC_HEALTH_STATUS_ERR_OROM_COMPAT = 0x506,
>> + ICE_AQC_HEALTH_STATUS_ERR_NVM_SEC_VIOLATION = 0x507,
>> + ICE_AQC_HEALTH_STATUS_ERR_OROM_SEC_VIOLATION = 0x508,
>> + ICE_AQC_HEALTH_STATUS_ERR_DCB_MIB = 0x509,
>> + ICE_AQC_HEALTH_STATUS_ERR_MNG_TIMEOUT = 0x50A,
>> + ICE_AQC_HEALTH_STATUS_ERR_BMC_RESET = 0x50B,
>> + ICE_AQC_HEALTH_STATUS_ERR_LAST_MNG_FAIL = 0x50C,
>> + ICE_AQC_HEALTH_STATUS_ERR_RESOURCE_ALLOC_FAIL = 0x50D,
>> + ICE_AQC_HEALTH_STATUS_ERR_FW_LOOP = 0x1000,
>> + ICE_AQC_HEALTH_STATUS_ERR_FW_PFR_FAIL = 0x1001,
>> + ICE_AQC_HEALTH_STATUS_ERR_LAST_FAIL_AQ = 0x1002,
>> +};
>> +
>> +/* Get Health Status (indirect 0xFF22) */
>> +struct ice_aqc_get_health_status {
>> + __le16 health_status_count;
>> + u8 reserved[6];
>> + __le32 addr_high;
>> + __le32 addr_low;
>> +};
>> +
>> +enum ice_aqc_health_status_scope {
>> + ICE_AQC_HEALTH_STATUS_PF = 0x1,
>> + ICE_AQC_HEALTH_STATUS_PORT = 0x2,
>> + ICE_AQC_HEALTH_STATUS_GLOBAL = 0x3,
>> +};
>> +
>> +#define ICE_AQC_HEALTH_STATUS_UNDEFINED_DATA 0xDEADBEEF
>> +
>> +/* Get Health Status event buffer entry (0xFF22),
>> + * repeated per reported health status.
>> + */
>> +struct ice_aqc_health_status_elem {
>> + __le16 health_status_code;
>> + __le16 event_source;
>> + __le32 internal_data1;
>> + __le32 internal_data2;
>> +};
>> +
>> /* Set FW Logging configuration (indirect 0xFF30)
>> * Register for FW Logging (indirect 0xFF31)
>> * Query FW Logging (indirect 0xFF32)
>> @@ -2651,6 +2732,8 @@ struct ice_aq_desc {
>> struct ice_aqc_get_link_status get_link_status;
>> struct ice_aqc_event_lan_overflow lan_overflow;
>> struct ice_aqc_get_link_topo get_link_topo;
>> + struct ice_aqc_set_health_status_cfg set_health_status_cfg;
>> + struct ice_aqc_get_health_status get_health_status;
>> struct ice_aqc_dnl_call_command dnl_call;
>> struct ice_aqc_i2c read_write_i2c;
>> struct ice_aqc_read_i2c_resp read_i2c_resp;
>> @@ -2853,6 +2936,10 @@ enum ice_adminq_opc {
>> /* Standalone Commands/Events */
>> ice_aqc_opc_event_lan_overflow = 0x1001,
>> + /* SystemDiagnostic commands */
>
> Add a space before Diagnostic?
>
>> + ice_aqc_opc_set_health_status_cfg = 0xFF20,
>> + ice_aqc_opc_get_health_status = 0xFF22,
>> +
>> /* FW Logging Commands */
>> ice_aqc_opc_fw_logs_config = 0xFF30,
>> ice_aqc_opc_fw_logs_register = 0xFF31,
>> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
>> index faba09b9d880..9c61318d3027 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_common.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
>> @@ -6047,6 +6047,44 @@ bool ice_is_phy_caps_an_enabled(struct
>> ice_aqc_get_phy_caps_data *caps)
>> return false;
>> }
>> +/**
>> + * ice_is_fw_health_report_supported
>> + * @hw: pointer to the hardware structure
>> + *
>> + * Return: true if firmware supports health status reports,
>> + * false otherwise
>> + */
>> +bool ice_is_fw_health_report_supported(struct ice_hw *hw)
>> +{
>> + return ice_is_fw_api_min_ver(hw, ICE_FW_API_HEALTH_REPORT_MAJ,
>> + ICE_FW_API_HEALTH_REPORT_MIN,
>> + ICE_FW_API_HEALTH_REPORT_PATCH);
>> +}
>> +
>> +/**
>> + * ice_aq_set_health_status_cfg - Configure FW health events
>> + * @hw: pointer to the HW struct
>> + * @event_source: type of diagnostic events to enable
>> + *
>> + * Configure the health status event types that the firmware will send to this
>> + * PF. The supported event types are: PF-specific, all PFs, and global.
>> + *
>> + * Return: 0 on success, negative error code otherwise.
>> + */
>> +int ice_aq_set_health_status_cfg(struct ice_hw *hw, u8 event_source)
>> +{
>> + struct ice_aqc_set_health_status_cfg *cmd;
>> + struct ice_aq_desc desc;
>> +
>> + cmd = &desc.params.set_health_status_cfg;
>> +
>> + ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_health_status_cfg);
>> +
>> + cmd->event_source = event_source;
>> +
>> + return ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
>> +}
>> +
>> /**
>> * ice_aq_set_lldp_mib - Set the LLDP MIB
>> * @hw: pointer to the HW struct
>> diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
>> index 52a1b72cce26..e132851dc0f0 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_common.h
>> +++ b/drivers/net/ethernet/intel/ice/ice_common.h
>> @@ -141,6 +141,8 @@ int
>> ice_get_link_default_override(struct ice_link_default_override_tlv
>> *ldo,
>> struct ice_port_info *pi);
>> bool ice_is_phy_caps_an_enabled(struct ice_aqc_get_phy_caps_data *caps);
>> +bool ice_is_fw_health_report_supported(struct ice_hw *hw);
>> +int ice_aq_set_health_status_cfg(struct ice_hw *hw, u8 event_source);
>> int ice_aq_get_phy_equalization(struct ice_hw *hw, u16 data_in, u16 op_code,
>> u8 serdes_num, int *output);
>> int
>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
>> index 7b9be612cf33..36cfbe771d1b 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>> @@ -1567,6 +1567,9 @@ static int __ice_clean_ctrlq(struct ice_pf *pf,
>> enum ice_ctl_q q_type)
>> case ice_aqc_opc_lldp_set_mib_change:
>> ice_dcb_process_lldp_set_mib_change(pf, &event);
>> break;
>> + case ice_aqc_opc_get_health_status:
>> + ice_process_health_status_event(pf, &event);
>> + break;
>> default:
>> dev_dbg(dev, "%s Receive Queue unknown event 0x%04x
>> ignored\n",
>> qtype, opcode);
>> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
>> index e2e6b2119889..42ac5a9f1cf4 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_type.h
>> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
>> @@ -1207,4 +1207,9 @@ struct ice_aq_get_set_rss_lut_params {
>> #define ICE_FW_API_REPORT_DFLT_CFG_MIN 7
>> #define ICE_FW_API_REPORT_DFLT_CFG_PATCH 3
>> +/* AQ API version for Health Status support */
>> +#define ICE_FW_API_HEALTH_REPORT_MAJ 1
>> +#define ICE_FW_API_HEALTH_REPORT_MIN 7
>> +#define ICE_FW_API_HEALTH_REPORT_PATCH 6
>> +
>> #endif /* _ICE_TYPE_H_ */
>
>
> Kind regards,
>
> Paul
Powered by blists - more mailing lists