netdev - Re: [PATCH] phy: added a PHY_BUSY state into phy_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFHy5LAQyL2JW1Lox67OSz2WuRnzhVgSk6-0hfHf=gG2fXYmRQ@mail.gmail.com>
Date:   Mon, 8 Jul 2019 23:16:02 -0400
From:   kwangdo yi <kwangdo.yi@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>
Cc:     netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine

I simply fixed this issue by increasing the polling time from 20 msec to
60 msec in Xilinx EMAC driver. But the state machine would be in a
better shape if it is capable of handling sub system driver's fake failure.
PHY device driver could advertising the min/max timeouts for its subsystem,
but still some vendor's EMAC driver fails to meet the deadline if this value
is not set properly in PHY driver.

On Sun, Jul 7, 2019 at 11:07 PM Florian Fainelli <f.fainelli@...il.com> wrote:
>
> +Andrew, Heiner (please CC PHY library maintainers).
>
> On 7/7/2019 3:32 PM, kwangdo.yi wrote:
> > When mdio driver polling the phy state in the phy_state_machine,
> > sometimes it results in -ETIMEDOUT and link is down. But the phy
> > is still alive and just didn't meet the polling deadline.
> > Closing the phy link in this case seems too radical. Failing to
> > meet the deadline happens very rarely. When stress test runs for
> > tens of hours with multiple target boards (Xilinx Zynq7000 with
> > marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
> > patch gives another chance to the phy_state_machine when polling
> > timeout happens. Only two consecutive failing the deadline is
> > treated as the real phy halt and close the connection.
>
> How about simply increasing the MDIO polling timeout in the Xilinx EMAC
> driver instead? Or if the PHY is where the timeout needs to be
> increased, allow the PHY device drivers to advertise min/max timeouts
> such that the MDIO bus layer can use that information?
>
> >
> >
> > Signed-off-by: kwangdo.yi <kwangdo.yi@...il.com>
> > ---
> >  drivers/net/phy/phy.c | 6 ++++++
> >  include/linux/phy.h   | 1 +
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> > index e888542..9e8138b 100644
> > --- a/drivers/net/phy/phy.c
> > +++ b/drivers/net/phy/phy.c
> > @@ -919,7 +919,13 @@ void phy_state_machine(struct work_struct *work)
> >               break;
> >       case PHY_NOLINK:
> >       case PHY_RUNNING:
> > +     case PHY_BUSY:
> >               err = phy_check_link_status(phydev);
> > +             if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
> > +                     phy->state = PHY_BUSY;
> > +                     err = 0;
> > +
> > +             }
> >               break;
> >       case PHY_FORCING:
> >               err = genphy_update_link(phydev);
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 6424586..4a49401 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -313,6 +313,7 @@ enum phy_state {
> >       PHY_RUNNING,
> >       PHY_NOLINK,
> >       PHY_FORCING,
> > +     PHY_BUSY,
> >  };
> >
> >  /**
> >
>
> --
> Florian