lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DS0PR11MB7785E94A608ED30F331FF770F0F22@DS0PR11MB7785.namprd11.prod.outlook.com>
Date: Mon, 10 Feb 2025 12:19:48 +0000
From: "Jagielski, Jedrzej" <jedrzej.jagielski@...el.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>
CC: "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
	"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, Simon Horman <horms@...nel.org>, "Kitszel,
 Przemyslaw" <przemyslaw.kitszel@...el.com>, "Polchlopek, Mateusz"
	<mateusz.polchlopek@...el.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-next v3] ixgbe: add support for
 thermal sensor event reception

From: Paul Menzel <pmenzel@...gen.mpg.de> 
Sent: Monday, February 10, 2025 1:07 PM
>Dear Jedrzej,
>
>
>Thank you for the quick reply.
>
>
>Am 10.02.25 um 12:59 schrieb Jagielski, Jedrzej:
>> From: Paul Menzel <pmenzel@...gen.mpg.de>
>> Sent: Monday, February 10, 2025 12:40 PM
>
>>> Am 10.02.25 um 11:40 schrieb Jedrzej Jagielski:
>>>> E610 NICs unlike the previous devices utilising ixgbe driver
>>>> are notified in the case of overheatning by the FW ACI event.
>>>
>>> overheating (without n)
>>>
>>>> In event of overheat when threshold is exceeded, FW suspends all
>>>> traffic and sends overtemp event to the driver.
>>>
>>> I guess this was already the behavior before your patch, and now it’s
>>> only logged, and the device stopped properly?
>
>> Without this patch driver cannot receive the overtemp info. FW handles
>> the event - device stops but user has no clue what has happened.
>> FW does the major work but this patch adds this minimal piece of
>> overtemp handling done by SW - informing user at the very end.
>
>Thank you for the confirmation. Maybe add a small note, that it’s not 
>logged to make it crystal clear for people like myself.

OK, i will extend the commit msg.

>
>>>> Then driver
>>>> logs appropriate message and closes the adapter instance.
>>>> The card remains in that state until the platform is rebooted.
>>>
>>> As a user I’d be interested what the threshold is, and what the measured
>>> temperature is. Currently, the log seems to be just generic?
>> 
>> These details are FW internals.
>> Driver just gets info that such event has happened.
>> There's no additional information.
>> 
>> In that case driver's job is just to inform user that such scenario
>> has happened and tell what should be the next steps.
>
> From a user perspective that is a suboptimal behavior, and shows 
>another time that the Linux kernel should have all the control, and 
>stuff like this should be moved *out* of the firmware and not into the 
>firmware.
>
>It’d be great if you could talk to the card designers/engineers to take 
>that into account, and maybe revert this decision for future versions or 
>future adapters.

I will have it on my mind.


Thanks,
Jedrek

>
>
>Kind regards,
>
>Paul
>
>
>>>      drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:static const char ixgbe_overheat_msg[] = "Network adapter has been stopped because it has over heated. Restart the computer. If the problem persists, power off the system and replace the adapter";
>>>
>>>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>
>>>> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@...el.com>
>>>> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@...el.com>
>>>> ---
>>>> v2,3 : commit msg tweaks
>>>> ---
>>>>    drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      | 5 +++++
>>>>    drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h | 3 +++
>>>>    2 files changed, 8 insertions(+)
>>>
>>>
>>> Kind regards,
>>>
>>> Paul
>>>
>>>
>>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> index 7236f20c9a30..5c804948dd1f 100644
>>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>>> @@ -3165,6 +3165,7 @@ static void ixgbe_aci_event_cleanup(struct ixgbe_aci_event *event)
>>>>    static void ixgbe_handle_fw_event(struct ixgbe_adapter *adapter)
>>>>    {
>>>>    	struct ixgbe_aci_event event __cleanup(ixgbe_aci_event_cleanup);
>>>> +	struct net_device *netdev = adapter->netdev;
>>>>    	struct ixgbe_hw *hw = &adapter->hw;
>>>>    	bool pending = false;
>>>>    	int err;
>>>> @@ -3185,6 +3186,10 @@ static void ixgbe_handle_fw_event(struct ixgbe_adapter *adapter)
>>>>    		case ixgbe_aci_opc_get_link_status:
>>>>    			ixgbe_handle_link_status_event(adapter, &event);
>>>>    			break;
>>>> +		case ixgbe_aci_opc_temp_tca_event:
>>>> +			e_crit(drv, "%s\n", ixgbe_overheat_msg);
>>>> +			ixgbe_close(netdev);
>>>> +			break;
>>>>    		default:
>>>>    			e_warn(hw, "unknown FW async event captured\n");
>>>>    			break;
>>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> index 8d06ade3c7cd..617e07878e4f 100644
>>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h
>>>> @@ -171,6 +171,9 @@ enum ixgbe_aci_opc {
>>>>    	ixgbe_aci_opc_done_alt_write			= 0x0904,
>>>>    	ixgbe_aci_opc_clear_port_alt_write		= 0x0906,
>>>>    
>>>> +	/* TCA Events */
>>>> +	ixgbe_aci_opc_temp_tca_event                    = 0x0C94,
>>>> +
>>>>    	/* debug commands */
>>>>    	ixgbe_aci_opc_debug_dump_internals		= 0xFF08,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ