netdev - Re: FEC MDIO read timeout on linkup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220503161356.GA35226@francesco-nb.int.toradex.com>
Date:   Tue, 3 May 2022 18:13:56 +0200
From:   Francesco Dolcini <francesco.dolcini@...adex.com>
To:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Cc:     Francesco Dolcini <francesco.dolcini@...adex.com>,
        Andy Duan <fugang.duan@....com>,
        Joakim Zhang <qiangqing.zhang@....com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Russell King <linux@...linux.org.uk>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Fabio Estevam <festevam@...il.com>,
        Tim Harvey <tharvey@...eworks.com>,
        Chris Healy <cphealy@...il.com>
Subject: Re: FEC MDIO read timeout on linkup

Hello all,

On Mon, May 02, 2022 at 08:34:43PM +0200, Francesco Dolcini wrote:
> On Mon, May 02, 2022 at 08:24:53PM +0200, Andrew Lunn wrote:
> > > writing to this register could trigger a FEC_ENET_MII interrupt actually
> > > creating a race condition with fec_enet_mdio_read() that is called on
> > > link change also.
> > 
> > An unexpected interrupt will make this exit too early, and the read
> > will get invalid data. An unexpected interrupt would not cause a
> > timeout here, which is what you are reporting.
> 
> I guess I need to sleep on this, in the meantime I have a test running
> with the change I described running since a couple of hours.

After a long sleep it seems that my change did not solve the issue. I
also verified that writing to the FEC_MII_SPEED does not trigger any
FEC_ENET_MII interrupt on my specific case.

I guess that this could be still a real issue, but it's not my specific
problem.

At the moment I'm a little bit lost, what I have verified so far is the
following:

 - fec_enet_mdio_read()/_write() locking. This is just correct, with the
   mdio mutex.
 - potential race condition with FEC_ENET_MII interrupt while writing
   FEC_MII_SPEED in fec_restart(). Proved wrong by both a test and by the
   fact that I do not have an interrupt generated on my case.
 - increasing fec_enet_mdio_wait() timeout to 100ms does not help.
 - clk_ipg is always active, once the device is open the clock is always
   on (verified with runtime power management debugging)


I'm wondering could this be related to
fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
and one of the many register write in fec_restart() just creates the
issue, maybe while resetting the FEC? Does this makes any sense?

Francesco