[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16a741c1-7005-b1df-f2e6-afdbe9d086c8@gmail.com>
Date: Wed, 4 Jan 2017 07:30:04 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Zefir Kurtisi <zefir.kurtisi@...atec.com>, netdev@...r.kernel.org
Cc: andrew@...n.ch
Subject: Re: [PATCH] phy state machine: failsafe leave invalid RUNNING state
On 01/04/2017 07:27 AM, Zefir Kurtisi wrote:
> On 01/04/2017 04:13 PM, Florian Fainelli wrote:
>>
>>
>> On 01/04/2017 07:04 AM, Zefir Kurtisi wrote:
>>> While in RUNNING state, phy_state_machine() checks for link changes by
>>> comparing phydev->link before and after calling phy_read_status().
>>> This works as long as it is guaranteed that phydev->link is never
>>> changed outside the phy_state_machine().
>>>
>>> If in some setups this happens, it causes the state machine to miss
>>> a link loss and remain RUNNING despite phydev->link being 0.
>>>
>>> This has been observed running a dsa setup with a process continuously
>>> polling the link states over ethtool each second (SNMPD RFC-1213
>>> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
>>> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
>>> call phy_read_status() and with that modify the link status - and
>>> with that bricking the phy state machine.
>>
>> That's the interesting part of the analysis, how does this brick the PHY
>> state machine? Is the PHY driver changing the link status in the
>> read_status callback that it implements?
>>
> phydev->read_status points to genphy_read_status(), where the first call goes to
> genphy_update_link() which updates the link status.
>
> Thereafter phy_state_machine():RUNNING won't be able to detect the link loss
> anymore unless the link state changes again.
>
>
> I was trying to figure out if there is a rule that forbids changing phydev->link
> from outside the state machine, but found several places where it happens (either
> directly, or over genphy_read_status() or over genphy_update_link()).
>
> Curious how this did not show up before, since within the dsa setup it is very
> easy to trigger:
> a) physically disconnect link
> b) within one second run ethtool ethX
You need to be more specific here about what "the dsa setup" is, drivers
involved, which ports of the switch you are seeing this with (user
facing, CPU port, DSA port?) etc.
--
Florian
Powered by blists - more mailing lists