lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbee86b8-fbdd-42ac-a7f9-efc934d59672@lunn.ch>
Date: Tue, 4 Mar 2025 14:41:05 +0100
From: Andrew Lunn <andrew@...n.ch>
To: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
Cc: Mark Pearson <mpearson-lenovo@...ebb.ca>, anthony.l.nguyen@...el.com,
	przemyslaw.kitszel@...el.com, andrew+netdev@...n.ch,
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, intel-wired-lan@...ts.osuosl.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option
 for false IRP events

> > > However, that does not really help explain how this helps prevent an
> > > interrupt. I assume playing with EEE settings was also played
> > > with. Not that is register appears to have anything to do with EEE!
> > > 
> > I don't think we did tried those - it was never suggested that I can recall (the original debug started 6 months+ ago). I don't know fully what testing Intel did in their lab once the issue was reproduced there.
> > 
> > If you have any particular recommendations we can try that - with a note that we have to run a soak for ~1 week to have confidence if a change made a difference (the issue can reproduce between 1 to 2 days).
> 
> Personally I doubt that it is related to EEE since there was no real link
> flap.

I tend to agree. Despite the group of registers being called LPI, it
appears this one has nothing to do with LPI? It would probably been
better to have it in page 776, General Registers, but that region is
full.

> > > I don't follow what you are saying here. As far as i can see, the
> > > interrupt handler will triggers a read of the BMCR to determine the
> > > link status. It should not matter if there is a spurious interrupt,
> > > the BMCR should report the truth. So does BMCR actually indicate the
> > > link did go down? I also see there is the usual misunderstanding with
> > > how BMCR is latching. It should not be read twice, processed once, it
> > > should be processed each time, otherwise you miss quick link down/up
> > > events.
> > > 
> > > > We communicated that this solution is not likely to be accepted to the
> > > > kernel as is, and the initial responses on the mailing list demonstrate the
> > > > pushback.
> > > 
> > > What it has done is start a discussion towards an acceptable
> > > solution. Which is a good thing. But at the moment, the discussion
> > > does not have sufficient details.
> > > 
> > > Please could somebody describe the chain of events which results in
> > > the link down, and subsequent link up. Is the interrupt spurious, or
> > > does BMCR really indicate the link went down and up again?
> > > 
> > 
> > I'm fairly certain there is no actual link bounce but I don't know the reason for the interrupt or why it was triggered.
> > 
> > Vitaly, do you have a way of getting these answers from the Intel team that worked on this? I don't think I'll be able to get any answers, unfortunately.
> 
> You are correct, from what we saw there was no real link flap there. Only a
> false link status change interrupt.
 
So if BMCR shows no state change, why is the driver doing anything?

I really would like to understand the chain of events. Once we
understand the chain of events, we can probably come up with a change
somewhere in the chain to break it, so the spurious interrupt is
ignored.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ