[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1d186774cfa4173955c89e7262b1d1b7@walle.cc>
Date: Fri, 02 Dec 2022 00:24:01 +0100
From: Michael Walle <michael@...le.cc>
To: Andrew Lunn <andrew@...n.ch>
Cc: Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Horatiu Vultur <horatiu.vultur@...rochip.com>,
netdev@...r.kernel.org, Xu Liang <lxu@...linear.com>
Subject: Re: GPY215 PHY interrupt issue
Am 2022-12-01 23:06, schrieb Andrew Lunn:
> On Thu, Dec 01, 2022 at 10:31:19PM +0100, Michael Walle wrote:
>> Am 2022-12-01 16:54, schrieb Andrew Lunn:
>> > > So, switching the line to GPIO input doesn't help here, which also
>> > > means the interrupt line will be stuck the whole time.
>> >
>> > Sounds like they totally messed up the design somehow.
>> >
>> > Since we are into horrible hack territory.....
>> >
>> > I assume you are using the Link state change interrupt? LSTC?
>>
>> Yes, but recently I've found it that it also happens with
>> the speed change interrupt (during link-up). By pure luck (or
>> bad luck really?) I discovered that when I reduce the MDIO
>> frequency I get a similar behavior for the interrupt line
>> at link-up with the LSPC interrupt. I don't think it has
>> something to do with the frequency but with changed timing.
>
> So at a wild guess, there is some sort of race condition between the
> 2.5MHz ish MDIO clock and the rest of the system which is probably
> clocked at 25Mhz.
>
> We have to hope this is limited to just interrupts! Not any MDIO bus
> transaction.
>
>> + /* The PHY might leave the interrupt line asserted even after
>> PHY_ISTAT
>> + * is read. To avoid interrupt storms, delay the interrupt handling
>> as
>> + * long as the PHY drives the interrupt line. An internal bus read
>> will
>> + * stall as long as the interrupt line is asserted, thus just read a
>> + * random register here.
>> + * Because we cannot access the internal bus at all while the
>> interrupt
>> + * is driven by the PHY, there is no way to make the interrupt line
>> + * unstuck (e.g. by changing the pinmux to GPIO input) during that
>> time
>> + * frame. Therefore, polling is the best we can do and won't do any
>> more
>> + * harm.
>> + * It was observed that this bug happens on link state and link speed
>> + * changes on a GPY215B and GYP215C independent of the firmware
>> version
>> + * (which doesn't mean that this list is exhaustive).
>> + */
>> + if (needs_mdint_wa && (reg & (PHY_IMASK_LSTC | PHY_IMASK_LSPC)))
>> + gpy_mbox_read(phydev, REG_GPIO0_OUT);
>> +
>> phy_trigger_machine(phydev);
>>
>> return IRQ_HANDLED;
>
> So this delayed exiting the interrupt handler until the line actually
> return to normal. And during that time, the interrupt is disabled. And
> hence the storm is avoided.
Exactly. Almost like your idea with polling the interrupt line
in GPIO mode.
> I'm assuming gpy_mbox_read() has a timeout, so if the PHY completely
> jams, it does exit? If that happens, maybe call phy_error() to
> indicate the PHY is dead. And don't return IRQ_HANDLED, but
> IRQ_NONE. I think the IRQ core should notice the storm for an
> interrupt which nobody cares about and disable the interrupt.
> Probably not much you can do after that, but at least the machine
> won't be totally dead.
Ah, yes. In the earlier code (that changed the pin to GPIO input),
I had that error handling. Missed that here, I'll add it.
And yes, there is an timeout of 10ms.
>> It seems like at least these two are :/ So with the code above
>> we could avoid the interrupt storm but we can't do anything about
>> the blocked interrupt line. I'm unsure if that is acceptable or
>> if we'd have to disable interrupts on this PHY altogether and
>> fallback to polling.
>
> It probably depends on your system design. If this is the only PHY and
> the storm can be avoided, it is probably O.K. If you have other PHYs
> sharing the line, and those PHYs are doing time sensitive stuff like
> PTP, maybe polling would be better.
>
> Maybe add a kconfig symbol CONFIG_MAXLINEAR_GPHY_BROKEN_INTERRUPTS and
> make all the interrupt code conditional on this? Hopefully distros
> won't enable it, but the option is there to buy into the behaviour if
> you want?
I don't even dare to ask, but wouldn't a device tree property
maxlinear,use-broken-interrupts make more sense than a compile time
option? I'm fine with both.
-michael
Powered by blists - more mailing lists