lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj3c+cDzdvsUbYtp@lunn.ch>
Date:   Fri, 25 Mar 2022 16:17:13 +0100
From:   Andrew Lunn <andrew@...n.ch>
To:     Francesco Dolcini <francesco.dolcini@...adex.com>
Cc:     Russell King <linux@...linux.org.uk>,
        netdev <netdev@...r.kernel.org>, fugang.duan@....com,
        Chris Healy <cphealy@...il.com>
Subject: Re: FEC MDIO timeout and polled IO

On Fri, Mar 25, 2022 at 03:08:08PM +0100, Francesco Dolcini wrote:
> Hello Andrew and all,
> I was recently debugging an issue in the FEC driver, about 2% of the
> time the driver is failing with "MDIO read timeout" at boot on a 5.4
> kernel.
> 
> This issue is not new and from time to time appear again, it seems that
> the previous interrupt based mechanism is somehow easy to break.
> 
> I backported your patch
> f166f890c8f0 (net: ethernet: fec: Replace interrupt driven MDIO with polled IO, 2020-05-02)
> to kernel 5.4 and it seems that it fixes the issue (I was able to do 470
> power cycles, while before it was failing after a couple of hundreds
> cycles best case).
> 
> Shouldn't this patch be backported to kernel 5.4? 

Hi Francesco

This patch was purely a performance boost, it was not a bug fix in any
way. That change also caused a lot of pain. There are at least two
different implementations of the MDIO bus in the FEC, and they
behaviour slightly differently. So what worked for me with the Vybrid
broke some other platforms. It took an NXP software engineer talking
to there hardware guys to figure out how to do this correctly. Which
is why you will see a complicated patch history.

I personally would not recommend a back port, unless you can test the
back port on a wide range of SoC with the FEC.

If you are getting timeouts, i would suggest you look at whatever else
is happening in the system during boot. Are interrupts getting
disabled for too long? Is something blocking the running of the
completion?

Or just update to v5.15.

   Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ