[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190811163107.GE14290@lunn.ch>
Date: Sun, 11 Aug 2019 18:31:07 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc: linux-arm-kernel@...ts.infradead.org,
Fabio Estevam <festevam@...il.com>, netdev@...r.kernel.org,
Florian Fainelli <f.fainelli@...il.com>,
Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [BUG] fec mdio times out under system stress
On Sun, Aug 11, 2019 at 02:37:07PM +0100, Russell King - ARM Linux admin wrote:
> Hi Fabio,
>
> When I woke up this morning, I found that one of the Hummingboards
> had gone offline (as in, lost network link) during the night.
> Investigating, I find that the system had gone into OOM, and at
> that time, triggered an unrelated:
>
> [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout
> [4111697.712996] MII_DATA: 0x6006796d
> [4111697.729415] MII_SPEED: 0x0000001a
> [4111697.745232] IEVENT: 0x00000000
> [4111697.745242] IMASK: 0x0a8000aa
> [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state change RUNNING -> HALTED
> [4111698.009882] fec 2188000.ethernet eth0: Link is Down
>
> This is on a dual-core iMX6.
>
> It looks like the read actually completed (since MII_DATA contains
> the register data) but we somehow lost the interrupt (or maybe
> received the interrupt after wait_for_completion_timeout() timed
> out.)
Hi Russell
The timeout is quite short,
#define FEC_MII_TIMEOUT 30000 /* us */
Looking at the Vybrid datasheet, there does not appear to be any way
to determine if the hardware is busy other than waiting for the
interrupt. There is no 'busy' bit which gets cleared on completion.
So about the only option is to make the timeout bigger.
Andrew
Powered by blists - more mailing lists