lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220325153353.GA1108006@francesco-nb.int.toradex.com>
Date:   Fri, 25 Mar 2022 16:33:53 +0100
From:   Francesco Dolcini <francesco.dolcini@...adex.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Francesco Dolcini <francesco.dolcini@...adex.com>,
        Russell King <linux@...linux.org.uk>,
        netdev <netdev@...r.kernel.org>, fugang.duan@....com,
        Chris Healy <cphealy@...il.com>
Subject: Re: FEC MDIO timeout and polled IO

Hello Andrew

On Fri, Mar 25, 2022 at 04:17:13PM +0100, Andrew Lunn wrote:
> On Fri, Mar 25, 2022 at 03:08:08PM +0100, Francesco Dolcini wrote:
> > Hello Andrew and all,
> > I was recently debugging an issue in the FEC driver, about 2% of the
> > time the driver is failing with "MDIO read timeout" at boot on a 5.4
> > kernel.
> > 
> > This issue is not new and from time to time appear again, it seems that
> > the previous interrupt based mechanism is somehow easy to break.
> > 
> > I backported your patch
> > f166f890c8f0 (net: ethernet: fec: Replace interrupt driven MDIO with polled IO, 2020-05-02)
> > to kernel 5.4 and it seems that it fixes the issue (I was able to do 470
> > power cycles, while before it was failing after a couple of hundreds
> > cycles best case).
> > 
> > Shouldn't this patch be backported to kernel 5.4? 
> 
> Hi Francesco
> 
> This patch was purely a performance boost, it was not a bug fix in any
> way. That change also caused a lot of pain. There are at least two
> different implementations of the MDIO bus in the FEC, and they
> behaviour slightly differently. So what worked for me with the Vybrid
> broke some other platforms. It took an NXP software engineer talking
> to there hardware guys to figure out how to do this correctly. Which
> is why you will see a complicated patch history.
> 
> I personally would not recommend a back port, unless you can test the
> back port on a wide range of SoC with the FEC.
I can test quite a few of i.MX SoC, but there is more than that using
this driver. I do not see a reason to push for such a change if you do
not feel like being a good idea.

> If you are getting timeouts, i would suggest you look at whatever else
> is happening in the system during boot. Are interrupts getting
> disabled for too long? Is something blocking the running of the
> completion?
I tried to do some debugging, but it was incredibly painful given that
the issue manifest itself only after a couple of hundreds boots. I also
tried the very simple workaround to double the timeout but it didn't
work out.

Bad enough the issue started to appear after updating to a more recent
5.4 kernel patch version.

> Or just update to v5.15.
I will probably just keep your patch in our tree till we are able to
migrate to a newer kernel, it seems to work pretty well (and yes, I took
also this [0]).

Thanks a lot,
Francesco!


[0] 0f0011824921 (net: fec: fix MDIO probing for some FEC hardware blocks, 2020-10-28)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ