lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Nov 2020 12:08:13 +0000
From:   "Badel, Laurent" <LaurentBadel@...on.com>
To:     Marco Felsch <m.felsch@...gutronix.de>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "fugang.duan@....com" <fugang.duan@....com>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "andrew@...n.ch" <andrew@...n.ch>,
        Heiner Kallweit <hkallweit1@...il.com>,
        "linux@...linux.org.uk" <linux@...linux.org.uk>,
        "p.zabel@...gutronix.de" <p.zabel@...gutronix.de>,
        "lgirdwood@...il.com" <lgirdwood@...il.com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "robh+dt@...nel.org" <robh+dt@...nel.org>,
        "richard.leitner@...data.com" <richard.leitner@...data.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "Quette, Arnaud" <ArnaudQuette@...on.com>
Subject: RE: [EXTERNAL]  Re: [PATCH net 0/4] Restore and fix PHY reset for
 SMSC LAN8720

Hi Marco, 
Thank you very much for your time reviewing and your helpful comments. 
Also sorry for the late reply. Please see my responses below. 
These are only my thoughts but I would be very interested to have your 
feedback if you don't mind and have time for this. 
I've been pulling my hair with this PHY for quite some time 
and am still pondering what would be the best solution to fix this once 
and for all.

> 

-----------------------------
Eaton Industries Manufacturing GmbH ~ Registered place of business: Route de la Longeraie 7, 1110, Morges, Switzerland 

-----------------------------

-----Original Message-----
> From: Marco Felsch <m.felsch@...gutronix.de>
> Sent: Thursday, October 29, 2020 9:16 AM
> To: Badel, Laurent <LaurentBadel@...on.com>
> Cc: davem@...emloft.net; fugang.duan@....com; kuba@...nel.org;
> andrew@...n.ch; Heiner Kallweit <hkallweit1@...il.com>;
> linux@...linux.org.uk; p.zabel@...gutronix.de; lgirdwood@...il.com;
> broonie@...nel.org; robh+dt@...nel.org; richard.leitner@...data.com;
> netdev@...r.kernel.org; devicetree@...r.kernel.org; f.fainelli@...il.com;
> Quette, Arnaud <ArnaudQuette@...on.com>
> Subject: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for SMSC
> LAN8720
> 
> Hi,
> 
> thanks for your patches :)
> 
> On 20-10-27 23:25, Badel, Laurent wrote:
> > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720
> >
> > Description:
> > A recent patchset [1] added support in the SMSC PHY driver for
> > managing the ref clock and therefore removed the
> PHY_RST_AFTER_CLK_EN
> > flag for the
> > LAN8720 chip. The ref clock is passed to the SMSC driver through a new
> > property "clocks" in the device tree.
> >
> > There appears to be two potential caveats:
> > (i) Building kernel 5.9 without updating the DT with the "clocks"
> > property for SMSC PHY, would break systems previously relying on the
> > PHY reset workaround (SMSC driver cannot grab the ref clock, so it is
> > still managed by FEC, but the PHY is not reset because
> > PHY_RST_AFTER_CLK_EN is not set). This may lead to occasional loss of
> > ethernet connectivity in these systems, that is difficult to debug.
> 
> IMHO reyling on PHY_RST_AFTER_CLK_EN was broken since the day of
> adding this feature because:
> 
> 1st) Each host driver needs to call the phy-reset logic. So this isn't a
>      fix for all hosts using a LAN8720 phy.
> 2st) It interacts realy bad with the phy state machine. Only the state
>      machine should be able to do this.
> 

I absolutely agree with this, but on the other hand it seems to me that
issues were actually reported only with the FEC driver, precisely because
the FEC enables/disables the ref clock on opening/closing the interface. 
This means that there is a time between the initial probing of the driver, 
and the first time the interface is brought up, where the ref clk is turned 
off, but the PHY is not held in reset, and this is what causes most if not 
all of the issues in my opinion. If there is a way to keep the PHY in reset
during that time, then the problem would be equally solved. 
I will look in direction for a possible resubmission.

I do wonder if there may be issues with suspend-to-ram/hibernation. I am 
unsure as I can only test with pm_debug, but when doing so I find that 
the clock stays on during hibernation, so the system resumes without problem. 
Would that also be the case during a real hibernation? If not then that would
be a case where an additional PHY reset would be needed. Any thoughts on this?


> Why can't you add the clock?

Agree this would be straightforward, my concern was that if for any reason 
one fails to add the clock to the DT, this results in a seemingly working 
system, but there might be issues in perhaps 1/100 cases which would be 
difficult to identify and/or diagnose (I work myself with a board that has
these issues and it is really difficult to even reproduce the problem since
it occurs in 1% or so of boards in production).

> 
> > (ii) This defeats the purpose of a previous commit [2] that disabled
> > the ref clock for power saving reasons. If a ref clock for the PHY is
> > specified in DT, the SMSC driver will keep it always on (confirmed
> > with scope).
> 
> NACK, the clock provider can be any clock. This has nothing to do with the
> FEC clocks. The FEC _can_ be used as clock provider.

I'm sure you understand this much better than I do. What I can say is that I 
directly measured the ref clk and found that when I add the clock to the DT
the clock stays on forever. Basically it seems like the FEC calls to 
clk_disable_unprepare() don't work in this case, though I'm not sure about the
reason behind this. I will investigate further to make sure I understand 
what is really going on. 

> 
> > While this removes the need for additional PHY resets (only a single
> > reset is needed after power up), this prevents the FEC from saving
> > power by disabling the refclk. Since there may be use cases where one
> > is interested in saving power,
> 
> You can't just turn off the clock for the LAN8720 because of the phy internal
> state machine. The state machine gets confused if the clock is turned off/on
> randomly.

My understanding was that it is the PHY hardware that gets confused, rather than
the state machine. Indeed removing the clock doesn't seem to me like a good thing
to do anyway, but I would guess that whoever does it, should also be responsible
for dealing with the consequences (in this case, performing a reset when 
re-enabling the clock). 
Of course, keeping the clock on is also an option, but if we can save power by
disabling it, then perhaps there is some added value in doing so? Otherwise, 
shouldn't the FEC be fixed directly if it is determined that there is no 
added value in turning off the clocks?

> 
> > keep this option available when no ref clock is specified for the PHY,
> > by fixing issues with the PHY reset.
> 
> IMHO pulling the reset line everytime has a few disadvantages:
>  - You need to ensure that the strapping pins are correct and
>  - You need to ensure that the reset logic including the reset delays
>    are keeped.
> 

I agree, but in my experience the LAN8720 absolutely needs one reset after power
up (this is specified in a SMSC schematic checklist as "A hardware reset (nRST 
assertion) is required following power-up"). This works out well because during 
initial probing of the driver the reset is asserted, but if the reset logic is 
bad I think you might experience problems anyway.

> > Main changes proposed to address this:
> > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it
> > if the SMSC driver succeeds in retrieving the ref clock.
> 
> IMHO NACK since this was the wrong approach.

Still I wonder if ensuring backwards compatibility wouldn't be a good thing?

> 
> > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by
> > re-configuring the PHY registers after reset.
> >
> > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces
> > of an iMX28-EVK-based board were tested, 3 of which were found to
> > exhibit issues when the "clocks" property was left unset. Issues were
> > fixed by the present patchset.
> 
> All iMX machines are now DT-based why can't you just add the correct clock
> provider?

I would probably, assuming I know about it. In my case I pulled the 5.9 
sources and found out "accidentally" that the PHY reset behavior had changed. 
I then did some research and found your commits. Of course, I'm at fault 
because I didn't do my homework of going through the changelog, but I thought
maybe I won't be the only one to make this mistake.

Besides, having worked with the (flawed indeed) reset workaround, I did build
a certain level of confidence that it works. Very honestly, if I had a choice 
I might hesitate to switch to the new clock management, as to be thorough I 
would have to re-test everything. 

> 
> Regards,
>   Marco

> -----Original Message-----
> From: Marco Felsch <m.felsch@...gutronix.de>
> Sent: Thursday, October 29, 2020 9:16 AM
> To: Badel, Laurent <LaurentBadel@...on.com>
> Cc: davem@...emloft.net; fugang.duan@....com; kuba@...nel.org;
> andrew@...n.ch; Heiner Kallweit <hkallweit1@...il.com>;
> linux@...linux.org.uk; p.zabel@...gutronix.de; lgirdwood@...il.com;
> broonie@...nel.org; robh+dt@...nel.org; richard.leitner@...data.com;
> netdev@...r.kernel.org; devicetree@...r.kernel.org; f.fainelli@...il.com;
> Quette, Arnaud <ArnaudQuette@...on.com>
> Subject: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for SMSC
> LAN8720
> 
> Hi,
> 
> thanks for your patches :)
> 
> On 20-10-27 23:25, Badel, Laurent wrote:
> > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720
> >
> > Description:
> > A recent patchset [1] added support in the SMSC PHY driver for
> > managing the ref clock and therefore removed the
> PHY_RST_AFTER_CLK_EN
> > flag for the
> > LAN8720 chip. The ref clock is passed to the SMSC driver through a new
> > property "clocks" in the device tree.
> >
> > There appears to be two potential caveats:
> > (i) Building kernel 5.9 without updating the DT with the "clocks"
> > property for SMSC PHY, would break systems previously relying on the
> > PHY reset workaround (SMSC driver cannot grab the ref clock, so it is
> > still managed by FEC, but the PHY is not reset because
> > PHY_RST_AFTER_CLK_EN is not set). This may lead to occasional loss of
> > ethernet connectivity in these systems, that is difficult to debug.
> 
> IMHO reyling on PHY_RST_AFTER_CLK_EN was broken since the day of
> adding this feature because:
> 
> 1st) Each host driver needs to call the phy-reset logic. So this isn't a
>      fix for all hosts using a LAN8720 phy.
> 2st) It interacts realy bad with the phy state machine. Only the state
>      machine should be able to do this.
> 
> Why can't you add the clock?
> 
> > (ii) This defeats the purpose of a previous commit [2] that disabled
> > the ref clock for power saving reasons. If a ref clock for the PHY is
> > specified in DT, the SMSC driver will keep it always on (confirmed
> > with scope).
> 
> NACK, the clock provider can be any clock. This has nothing to do with the
> FEC clocks. The FEC _can_ be used as clock provider.
> 
> > While this removes the need for additional PHY resets (only a single
> > reset is needed after power up), this prevents the FEC from saving
> > power by disabling the refclk. Since there may be use cases where one
> > is interested in saving power,
> 
> You can't just turn off the clock for the LAN8720 because of the phy internal
> state machine. The state machine gets confused if the clock is turned off/on
> randomly.
> 
> > keep this option available when no ref clock is specified for the PHY,
> > by fixing issues with the PHY reset.
> 
> IMHO pulling the reset line everytime has a few disadvantages:
>  - You need to ensure that the strapping pins are correct and
>  - You need to ensure that the reset logic including the reset delays
>    are keeped.
> 
> > Main changes proposed to address this:
> > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it
> > if the SMSC driver succeeds in retrieving the ref clock.
> 
> IMHO NACK since this was the wrong approach.
> 
> > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by
> > re-configuring the PHY registers after reset.
> >
> > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces
> > of an iMX28-EVK-based board were tested, 3 of which were found to
> > exhibit issues when the "clocks" property was left unset. Issues were
> > fixed by the present patchset.
> 
> All iMX machines are now DT-based why can't you just add the correct clock
> provider?
> 
> Regards,
>   Marco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ