netdev - Re: i.MX28 based system losing eth0 on boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140506181151.GU28564@pengutronix.de>
Date:	Tue, 6 May 2014 20:11:51 +0200
From:	Uwe Kleine-König 
	<u.kleine-koenig@...gutronix.de>
To:	Brian Lilly <brian@...stalfontz.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Fabio Estevam <fabio.estevam@...escale.com>,
	Jim Baxter <jim_baxter@...tor.com>,
	Frank Li <Frank.Li@...escale.com>,
	Fugang Duan <B38611@...escale.com>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, kernel@...gutronix.de
Subject: Re: i.MX28 based system losing eth0 on boot

Hello Brian,

On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
> come up, then brought right back down with an MDIO rx timeout moments
> after.  Adding back in the removed code keeps the interface alive and
> it's working afterward without trouble.  I've tested the re-inserted
> code in 3.12, 3.14 without issue on our boards.
So you can reliably trigger that problem? You're just doing

	ifconfig eth0 1.2.3.4 up

(or equivalent) and the interface goes down without further
interference with the above mentioned commit? The exact error you're
seeing is

	MDIO read timeout

(with some prefix saying something about fec and eth0 I think)?

This error is also present with a264b981f2 reverted, just doesn't affect
eth0 being functional? Does the timeout always happen, or only on
specific addresses?

This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?

> Is there something else that can be done to prevent the MDIO timeouts?
> We are using basically the same schematic for networking as the
> imx28evk.
Hard to say, but assuming it works just fine on the imx28evk for you,
too, there seems to be some hardware difference that makes your machine
fail. (That doesn't mean it's not fixable in software.)

I don't know if a mdio read error is intended to make the device go
down, maybe one the the netdev guys can answer that.
Assuming that it's not intended, instrument the code, find out how that
timeout makes your device go down and find the wrong branch. I'd start
with adding stackdumps when the mdio timeout happens and when
fec_enet_start_xmit is called with fep->link == 0.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html