[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DS0PR11MB7785E94A608ED30F331FF770F0F22@DS0PR11MB7785.namprd11.prod.outlook.com>
Date: Mon, 10 Feb 2025 12:19:48 +0000
From: "Jagielski, Jedrzej" <jedrzej.jagielski@...el.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>
CC: "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, Simon Horman <horms@...nel.org>, "Kitszel,
Przemyslaw" <przemyslaw.kitszel@...el.com>, "Polchlopek, Mateusz"
<mateusz.polchlopek@...el.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-next v3] ixgbe: add support for
thermal sensor event reception
From: Paul Menzel <pmenzel@...gen.mpg.de>
Sent: Monday, February 10, 2025 1:07 PM
>Dear Jedrzej,
>
>
>Thank you for the quick reply.
>
>
>Am 10.02.25 um 12:59 schrieb Jagielski, Jedrzej:
>> From: Paul Menzel <pmenzel@...gen.mpg.de>
>> Sent: Monday, February 10, 2025 12:40 PM
>
>>> Am 10.02.25 um 11:40 schrieb Jedrzej Jagielski:
>>>> E610 NICs unlike the previous devices utilising ixgbe driver
>>>> are notified in the case of overheatning by the FW ACI event.
>>>
>>> overheating (without n)
>>>
>>>> In event of overheat when threshold is exceeded, FW suspends all
>>>> traffic and sends overtemp event to the driver.
>>>
>>> I guess this was already the behavior before your patch, and now it’s
>>> only logged, and the device stopped properly?
>
>> Without this patch driver cannot receive the overtemp info. FW handles
>> the event - device stops but user has no clue what has happened.
>> FW does the major work but this patch adds this minimal piece of
>> overtemp handling done by SW - informing user at the very end.
>
>Thank you for the confirmation. Maybe add a small note, that it’s not
>logged to make it crystal clear for people like myself.
OK, i will extend the commit msg.
>
>>>> Then driver
>>>> logs appropriate message and closes the adapter instance.
>>>> The card remains in that state until the platform is rebooted.
>>>
>>> As a user I’d be interested what the threshold is, and what the measured
>>> temperature is. Currently, the log seems to be just generic?
>>
>> These details are FW internals.
>> Driver just gets info that such event has happened.
>> There's no additional information.
>>
>> In that case driver's job is just to inform user that such scenario
>> has happened and tell what should be the next steps.
>
> From a user perspective that is a suboptimal behavior, and shows
>another time that the Linux kernel should have all the control, and
>stuff like this should be moved *out* of the firmware and not into the
>firmware.
>
>It’d be great if you could talk to the card designers/engineers to take
>that into account, and maybe revert this decision for future versions or
>future adapters.
I will have it on my mind.
Thanks,
Jedrek
>
>
>Kind regards,
>
>Paul
>
>
>>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:static const char ixgbe_overheat_msg[] = "Network adapter has been stopped because it has over heated. Restart the computer. If the problem persists, power off the system and replace the adapter";
>>>
>>>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>
>>>> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@...el.com>
>>>> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@...el.com>
>>>> ---
>>>> v2,3 : commit msg tweaks
>>>> ---
>>>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +++++
>>>> drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h | 3 +++
>>>> 2 files changed, 8 insertions(+)
>>>
>>>
>>> Kind regards,
>>>
>>> Paul
>>>
>>>
>>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> index 7236f20c9a30..5c804948dd1f 100644
>>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> @@ -3165,6 +3165,7 @@ static void ixgbe_aci_event_cleanup(struct ixgbe_aci_event *event)
>>>> static void ixgbe_handle_fw_event(struct ixgbe_adapter *adapter)
>>>> {
>>>> struct ixgbe_aci_event event __cleanup(ixgbe_aci_event_cleanup);
>>>> + struct net_device *netdev = adapter->netdev;
>>>> struct ixgbe_hw *hw = &adapter->hw;
>>>> bool pending = false;
>>>> int err;
>>>> @@ -3185,6 +3186,10 @@ static void ixgbe_handle_fw_event(struct ixgbe_adapter *adapter)
>>>> case ixgbe_aci_opc_get_link_status:
>>>> ixgbe_handle_link_status_event(adapter, &event);
>>>> break;
>>>> + case ixgbe_aci_opc_temp_tca_event:
>>>> + e_crit(drv, "%s\n", ixgbe_overheat_msg);
>>>> + ixgbe_close(netdev);
>>>> + break;
>>>> default:
>>>> e_warn(hw, "unknown FW async event captured\n");
>>>> break;
>>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> index 8d06ade3c7cd..617e07878e4f 100644
>>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> @@ -171,6 +171,9 @@ enum ixgbe_aci_opc {
>>>> ixgbe_aci_opc_done_alt_write = 0x0904,
>>>> ixgbe_aci_opc_clear_port_alt_write = 0x0906,
>>>>
>>>> + /* TCA Events */
>>>> + ixgbe_aci_opc_temp_tca_event = 0x0C94,
>>>> +
>>>> /* debug commands */
>>>> ixgbe_aci_opc_debug_dump_internals = 0xFF08,
Powered by blists - more mailing lists