linux-kernel - Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fdf0074a-2572-5914-6f3e-77202cbf96de@gmail.com>
Date:   Sun, 31 May 2020 00:36:39 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>, stable@...r.kernel.org,
        gregkh@...uxfoundation.org, netdev@...r.kernel.org, andrew@...n.ch,
        f.fainelli@...il.com, linux@...linux.org.uk, davem@...emloft.net,
        kuba@...nel.org
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN
 has not completed in PHY_AN state

On 30.05.2020 23:43, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@....com>
> 
> In kernel 4.19 (and probably earlier too) there are issues surrounding
> the PHY_AN state.
> 
> For example, if a PHY is in PHY_AN state and AN has not finished, then
> what is supposed to happen is that the state machine gets rescheduled
> until it is, or until the link_timeout reaches zero which triggers an
> autoneg restart process.
> 
> But actually the rescheduling never works if the PHY uses interrupts,
> because the condition under which rescheduling occurs is just if
> phy_polling_mode() is true. So basically, this whole rescheduling
> functionality works for AN-not-yet-complete just by mistake. Let me
> explain.
> 
> Most of the time the AN process manages to finish by the time the
> interrupt has triggered. One might say "that should always be the case,
> otherwise the PHY wouldn't raise the interrupt, right?".
> Well, some PHYs implement an .aneg_done method which allows them to tell
> the state machine when the AN is really complete.
> The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> copper autoneg completes, the driver still keeps the "aneg_done"
> variable unset until in-band SGMII autoneg finishes too (there is no
> interrupt for that). So we have the premises of a race condition.
> 
That's not nice from the PHY:
It signals "link up", and if the system asks the PHY for link details,
then it sheepishly says "well, link is *almost* up".

Question would be whether the same happens with other SGMII-capable
PHY's so that we need to cater for this scenario in phylib.
Or whether we consider it a chip quirk. In the latter case a custom
read_status() handler might do the trick too: if link is reported
as up then wait until aneg is signaled as done too before reading
further link details.

And it's interesting that nobody else stumbled across this problem
before. I mean the PHY we talk about isn't really new. Or is your
use case so special?

> In practice, what really happens depends on the log level of the serial
> console. If the log level is verbose enough that kernel messages related
> to the Ethernet link state are printed to the console, then this gives
> in-band AN enough time to complete, which means the link will come up
> and everyone will be happy. But if the console is not that verbose, the
> link will sometimes come up, and sometimes will be forced down by the
> .aneg_done of the PHY driver (forever, since we are not rescheduling).
> 
> The conclusion is that an extra condition needs to be explicitly added,
> so that the state machine can be rescheduled properly. Otherwise PHY
> devices in interrupt mode will never work properly if they have an
> .aneg_done callback.
> 
> In more recent kernels, the whole PHY_AN state was removed by Heiner
> Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
> state machine" series here:
> 
> https://patchwork.ozlabs.org/cover/994464/
> 
> and the problem was just masked away instead of being addressed with a
> punctual patch.
> 
> Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> I'm not sure the procedure I'm following is correct, sending this
> directly to Greg. The patch doesn't apply on net.
> 
>  drivers/net/phy/phy.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index cc454b8c032c..ca4fd74fd2c8 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
>  	struct delayed_work *dwork = to_delayed_work(work);
>  	struct phy_device *phydev =
>  			container_of(dwork, struct phy_device, state_queue);
> -	bool needs_aneg = false, do_suspend = false;
> +	bool recheck = false, needs_aneg = false, do_suspend = false;
>  	enum phy_state old_state;
>  	int err = 0;
>  	int old_link;
> @@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
>  			phy_link_up(phydev);
>  		} else if (0 == phydev->link_timeout--)
>  			needs_aneg = true;
> +		else
> +			recheck = true;
>  		break;
>  	case PHY_NOLINK:
>  		if (!phy_polling_mode(phydev))
> @@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
>  	 * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
>  	 * between states from phy_mac_interrupt()
>  	 */
> -	if (phy_polling_mode(phydev))
> +	if (phy_polling_mode(phydev) || recheck)
>  		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
>  				   PHY_STATE_TIME * HZ);
>  }
>