[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ded1438-b3cf-19d2-c3f4-1c1da3505295@t2data.com>
Date: Mon, 5 Apr 2021 15:53:10 +0200
From: Christian Melki <christian.melki@...ata.com>
To: Heiner Kallweit <hkallweit1@...il.com>,
Joakim Zhang <qiangqing.zhang@....com>, andrew@...n.ch,
linux@...linux.org.uk, davem@...emloft.net, kuba@...nel.org
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-imx@....com
Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume
back
On 4/5/21 2:09 PM, Heiner Kallweit wrote:
> On 05.04.2021 10:43, Christian Melki wrote:
>> On 4/5/21 12:48 AM, Heiner Kallweit wrote:
>>> On 04.04.2021 16:09, Heiner Kallweit wrote:
>>>> On 04.04.2021 12:07, Joakim Zhang wrote:
>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut
>>>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will
>>>>> soft reset PHY if PHY driver implements soft_reset callback.
>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to
>>>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these
>>>>> two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't
>>>>> work any more when system resume back, MAC driver is fec_main.c.
>>>>>
>>>>> It's obvious that initializing PHY hardware when MDIO bus resume back
>>>>> would introduce some regression when PHY implements soft_reset. When I
>>>>
>>>> Why is this obvious? Please elaborate on why a soft reset should break
>>>> something.
>>>>
>>>>> am debugging, I found PHY works fine if MAC doesn't support suspend/resume
>>>>> or phy_stop()/phy_start() doesn't been called during suspend/resume. This
>>>>> let me realize, PHY state machine phy_state_machine() could do something
>>>>> breaks the PHY.
>>>>>
>>>>> As we known, MAC resume first and then MDIO bus resume when system
>>>>> resume back from suspend. When MAC resume, usually it will invoke
>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the stat> machine to run now. In phy_state_machine(), it will start/config
>>>>> auto-nego, then change PHY state to PHY_NOLINK, what to next is
>>>>> periodically check PHY link status. When MDIO bus resume, it will
>>>>> initialize PHY hardware, including soft_reset, what would soft_reset
>>>>> affect seems various from different PHYs. For KSZ8081RNB, when it in
>>>>> PHY_NOLINK state and then perform a soft reset, it will never complete
>>>>> auto-nego.
>>>>
>>>> Why? That would need to be checked in detail. Maybe chip errata
>>>> documentation provides a hint.
>>>>
>>>
>>> The KSZ8081 spec says the following about bit BMCR_PDOWN:
>>>
>>> If software reset (Register 0.15) is
>>> used to exit power-down mode
>>> (Register 0.11 = 1), two software
>>> reset writes (Register 0.15 = 1) are
>>> required. The first write clears
>>> power-down mode; the second
>>> write resets the chip and re-latches
>>> the pin strapping pin values.
>>>
>>> Maybe this causes the issue you see and genphy_soft_reset() isn't
>>> appropriate for this PHY. Please re-test with the KSZ8081 soft reset
>>> following the spec comment.
>>>
>>
>> Interesting. Never expected that behavior.
>> Thanks for catching it. Skimmed through the datasheets/erratas.
>> This is what I found (micrel.c):
>>
>> 10/100:
>> 8001 - Unaffected?
>> 8021/8031 - Double reset after PDOWN.
>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset
>> solves the issue since errata says no error after reset but is also
>> claiming that only toggling PDOWN (may) or power will help.
>> 8051 - Double reset after PDOWN.
>> 8061 - Double reset after PDOWN.
>> 8081 - Double reset after PDOWN.
>> 8091 - Double reset after PDOWN.
>>
>> 10/100/1000:
>> Nothing in gigabit afaics.
>>
>> Switches:
>> 8862 - Not affected?
>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists.
>> 8864 - Not affected?
>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists.
>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on
>> adjacent links. Do not use.
>>
>> This certainly explains a lot.
>>
>>>>>
>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it
>>>>> should be reasonable after PHY hardware re-initialized. Also give state
>>>>> machine a chance to start/config auto-nego again.
>>>>>
>>>>
>>>> If the MAC driver calls phy_stop() on suspend, then phydev->suspended
>>>> is true and mdio_bus_phy_may_suspend() returns false. As a consequence
>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume()
>>>> skips the PHY hw initialization.
>>>> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine()
>>>> that sets the state to PHY_UP.
>>>>
>>>
>>> Forgot that MDIO bus suspend is done before MAC driver suspend.
>>> Therefore disregard this part for now.
>>>
>>>> Having said that the current argumentation isn't convincing. I'm not
>>>> aware of such issues on other systems, therefore it's likely that
>>>> something is system-dependent.
>>>>
>>>> Please check the exact call sequence on your system, maybe it
>>>> provides a hint.
>>>>
>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@....com>
>>>>> ---
>>>>> drivers/net/phy/phy_device.c | 7 +++++++
>>>>> 1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>> index cc38e326405a..312a6f662481 100644
>>>>> --- a/drivers/net/phy/phy_device.c
>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
>>>>> ret = phy_resume(phydev);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> +
>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller resume
>>>>> + * rounte with phy_start(), here change to PHY_UP after re-initializing
>>>>> + * PHY hardware, let PHY state machine to start/config auto-nego again.
>>>>> + */
>>>>> + phydev->state = PHY_UP;
>>>>> +
>>>>> no_resume:
>>>>> if (phydev->attached_dev && phydev->adjust_link)
>>>>> phy_start_machine(phydev);
>>>>>
>>>>
>>>
>>
>
> This is a quick draft of the modified soft reset for KSZ8081.
> Some tests would be appreciated.
>
>
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index a14a00328..4902235a8 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device *phydev,
> data[i] = kszphy_get_stat(phydev, i);
> }
>
> +int ksz8081_soft_reset(struct phy_device *phydev)
> +{
> + int bmcr, ret, val;
> +
> + phy_lock_mdio_bus(phydev);
> +
> + bmcr = __phy_read(phydev, MII_BMCR);
> + if (bmcr < 0)
> + return bmcr;
> +
> + bmcr |= BMCR_RESET;
> +
> + if (bmcr & BMCR_PDOWN)
> + __phy_write(phydev, MII_BMCR, bmcr);
> +
> + if (phydev->autoneg == AUTONEG_ENABLE)
> + bmcr |= BMCR_ANRESTART;
> +
> + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE);
> +
Wouldn't this re-set BMCR_PDOWN?
Since this is probably required by a few other micrel phys,
maybe a kszphy_type flag and continue with genphy_soft_reset?
> + phy_unlock_mdio_bus(phydev);
> +
> + phydev->suspended = 0;
> +
> + ret = phy_read_poll_timeout(phydev, MII_BMCR, val, !(val & BMCR_RESET),
> + 50000, 600000, true);
> + if (ret)
> + return ret;
> +
> + /* BMCR may be reset to defaults */
> + if (phydev->autoneg == AUTONEG_DISABLE)
> + ret = genphy_setup_forced(phydev);
> +
> + return ret;
> +}
> +
> static int kszphy_suspend(struct phy_device *phydev)
> {
> /* Disable PHY Interrupts */
> @@ -1303,7 +1339,7 @@ static struct phy_driver ksphy_driver[] = {
> .driver_data = &ksz8081_type,
> .probe = kszphy_probe,
> .config_init = ksz8081_config_init,
> - .soft_reset = genphy_soft_reset,
> + .soft_reset = ksz8081_soft_reset,
> .config_intr = kszphy_config_intr,
> .handle_interrupt = kszphy_handle_interrupt,
> .get_sset_count = kszphy_get_sset_count,
>
Powered by blists - more mailing lists