lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b5b0f52-7ed8-7eef-2467-fa59ca5de937@intel.com>
Date: Sun, 2 Mar 2025 15:09:35 +0200
From: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
To: Mark Pearson <mpearson-lenovo@...ebb.ca>, Andrew Lunn <andrew@...n.ch>
CC: <anthony.l.nguyen@...el.com>, <przemyslaw.kitszel@...el.com>,
	<andrew+netdev@...n.ch>, <davem@...emloft.net>, <edumazet@...gle.com>,
	<kuba@...nel.org>, <pabeni@...hat.com>, <intel-wired-lan@...ts.osuosl.org>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for
 false IRP events



Hi Mark,

> Hi Andrew
> 
> On Thu, Feb 27, 2025, at 11:07 AM, Andrew Lunn wrote:
>>>>> +			e1e_rphy(hw, PHY_REG(772, 26), &phy_data);
>>>>
>>>> Please add some #define for these magic numbers, so we have some idea
>>>> what PHY register you are actually reading. That in itself might help
>>>> explain how the workaround actually works.
>>>>
>>>
>>> I don't know what this register does I'm afraid - that's Intel knowledge and has not been shared.
>>
>> What PHY is it? Often it is just a COTS PHY, and the datasheet might
>> be available.
>>
>> Given your setup description, pause seems like the obvious thing to
>> check. When trying to debug this, did you look at pause settings?
>> Knowing what this register is might also point towards pause, or
>> something totally different.
>>
>> 	Andrew
> 
> For the PHY - do you know a way of determining this easily? I can reach out to the platform team but that will take some time. I'm not seeing anything in the kernel logs, but if there's a recommended way of confirming that would be appreciated.

The PHY is I219 PHY.
The datasheet is indeed accessible to the public: 
https://cdrdv2-public.intel.com/612523/ethernet-connection-i219-datasheet.pdf

> 
> We did look at at the pause pieces - which I agree seems like an obvious candidate given the speed mismatch on the network.
> Experts on the Intel networking team did reproduce the issue in their lab and looked at this for many weeks without determining root cause. I wish it was as obvious as pause control configuration :)
> 
> Thanks
> Mark
> 

Reading this register was suggested for debug purposes to understand if 
there is some misconfiguration. We did not find any misconfiguration.
The issue as we discovered was a link status change interrupt caused the 
CSME to reset the adapter causing the link flap.

We were unable to determine what causes the link status change interrupt 
in the first place. As stated in the comment, it was only ever observed 
on Lenovo P5/P7systems and we couldn't ever reproduce on other systems. 
The reproduction in our lab was on a P5 system as well.


Regarding the suggested workaround, there isn’t a clear understanding 
why it works. We suspect that reading a PHY register is probably 
prevents the CSME from resetting the PHY when it handles the LSC 
interrupt it gets. However, it can also be a matter of slight timing 
variations.
We communicated that this solution is not likely to be accepted to the 
kernel as is, and the initial responses on the mailing list demonstrate 
the pushback. We do understand the frustration of end-users that may 
experience the problem. A couple of suggestions that can make it look 
less “out-of-the-blue” are: try a short delay instead of the register 
read, or read a more common register like PHY STATUS instead.
On a different topic, I suggest removing the part of the comment below:
* Intel unable to determine root cause.
The issue went through joint debug by Intel and Lenovo, and no obvious 
spec violations by either party were found. There doesn’t seem to be 
value in including this information in the comments of upstream code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ