[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b0a20fa-b2ee-e7fa-fdfb-dedabe81b03f@orolia.com>
Date: Tue, 12 May 2020 11:28:40 +0200
From: Julien Beraud <julien.beraud@...lia.com>
To: Florian Fainelli <f.fainelli@...il.com>,
Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
Antoine Tenart <antoine.tenart@...tlin.com>
Subject: Re: net: phylink: supported modes set to 0 with genphy sfp module
On 11/05/2020 21:06, Florian Fainelli wrote:
>
>
> On 5/11/2020 11:29 AM, Russell King - ARM Linux admin wrote:
>> On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
>>> Following commit:
>>>
>>> commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
>>> Author: Russell King <rmk+kernel@...linux.org.uk>
>>> Date: Wed Dec 11 10:56:45 2019 +0000
>>>
>>> net: phylink: delay MAC configuration for copper SFP modules
>>>
>>>
>>> In function phylink_sfp_connect_phy, phylink_sfp_config is called before
>>> phylink_attach_phy.
>>>
>>> In the case of a genphy, the "supported" field of the phy_device is filled
>>> by:
>>> phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
>>>
>>> It means that:
>>>
>>> ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
>>> will have phy->supported with no bits set, and then the first call to
>>> phylink_validate in phylink_sfp_config will return an error:
>>>
>>> return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
>>>
>>> this results in putting the sfp driver in "failed" state.
>>
>> Which PHY is this?
The phy seems to be Marvell 88E1111, so the simple solution is to just add the driver for this PHY to my config.
That said, if for some reason someone plugs a module for which no phy driver is found the issue will happen again.
>
> Using the generic PHY with a copper SFP module does not sound like a
> great idea because without a specialized PHY driver (that is, not the
> Generic PHY driver) there is not usually much that can happen.
Thanks for the info. I don't have an advice on whether it is a good idea to use a copper sfp without a specialized driver,
but before commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, it used to work and I could at least get a network connection.
Moreover, this commit didn't explicitely intend to forbid this behavior and the error is not explicit either.
If phylink+sfp still supports using genphy as a fallback, It may be good to fix the current behavior.
If not, maybe adding an explicit warning or error would be preferrable.
-------------------
A bit more details about the issue:
- The board I am using has a MAC connected to a PCS, connected to an sfp cage, so no PHY on-board.
- No driver is found for the PHY that's on the sfp module I am using.
- The MAC + PCS driver's conversion to phylink still needs to be upstreamed, and we'll send patches soon hopefully.
I can make the code available if needed, but the issue doesn't depend on the MAC+PCS driver.
It has been reproduced on mcbin by Antoine Tenart (thanks!) by just removing the driver for the phy he is using from his kernel.
It should be reproducible on any board with such a setup. The IP I am using is altera triple speed ethernet MAC + PCS.
- The issue happens when plugging an sfp module in the sfp cage which is copper and embeds a PHY for which no driver is found and falls back to genphy.
The error that happens is the following (call trace plus debug prints plus a few debug prints I have added :
[ 27.607215] sfp soc:sfp1: mod-def0 1 -> 0
[ 27.611235] sfp soc:sfp1: tx-fault 0 -> 1
[ 27.615247] sfp soc:sfp1: SM: enter present:up:fail event remove
[ 27.626668] sfp soc:sfp1: module removed
[ 27.632415] sfp soc:sfp1: tx disable 0 -> 1
[ 27.636618] sfp soc:sfp1: SM: exit empty:up:down
[ 27.644541] sfp soc:sfp1: SM: enter empty:up:down event tx_fault
[ 27.652282] sfp soc:sfp1: SM: exit empty:up:down
[ 29.077218] sfp soc:sfp1: mod-def0 0 -> 1
[ 29.081238] sfp soc:sfp1: tx-fault 1 -> 0
[ 29.085250] sfp soc:sfp1: SM: enter empty:up:down event insert
[ 29.096669] sfp soc:sfp1: SM: exit probe:up:down
[ 29.109535] sfp soc:sfp1: SM: enter probe:up:down event tx_clear
[ 29.116892] sfp soc:sfp1: SM: exit probe:up:down
[ 29.397193] sfp soc:sfp1: SM: enter probe:up:down event timeout
[ 29.415422] sfp soc:sfp1: module AVAGO ABCU-5731ARZ rev sn AGC14275317E dc 140704
[ 29.425204] sfp soc:sfp1: tx disable 1 -> 0
[ 29.429448] sfp soc:sfp1: SM: exit present:up:wait
[ 29.487209] sfp soc:sfp1: SM: enter present:up:wait event timeout
[ 29.502605] sfp soc:sfp1: sfp_sm_probe_phy, support 00,00000000,00000000
[ 29.509396] altera_tse c0200000.ethernet eth0: phylink_sfp_connect_phy-1, support 00,00000000,00000000
[ 29.518749] altera_tse c0200000.ethernet eth0: alt_tse_validate, support 00,00000000,00000000
[ 29.527311] altera_tse c0200000.ethernet eth0: supported : 00,00000000,00000000
[ 29.534614] altera_tse c0200000.ethernet eth0: validation with support 00,00000000,00000000 failed: -22
[ 29.544232] sfp soc:sfp1: sfp_add_phy failed: -22
[ 29.548999] sfp soc:sfp1: SM: exit present:up:fail
[ 29.557243] sfp soc:sfp1: los 1 -> 0
[ 29.560833] sfp soc:sfp1: SM: enter present:up:fail event los_low
[ 29.566918] sfp soc:sfp1: SM: exit present:up:fail
[ 29.677195] sfp soc:sfp1: los 0 -> 1
[ 29.680782] sfp soc:sfp1: SM: enter present:up:fail event los_high
[ 29.686950] sfp soc:sfp1: SM: exit present:up:fail
And the call trace leading to the phylink_validate call that fails :
=> phylink_validate
=> phylink_sfp_config
=> phylink_sfp_connect_phy
=> sfp_add_phy
=> sfp_sm_probe_phy
=> sfp_sm_event
=> sfp_timeout
=> process_one_work
=> worker_thread
=> kthread
=> ret_from_fork
=> 0
------------------------------------------------------------
This issue disappears when I revert commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, and the network connection seems to work.
It seems that before this commit, the call to phylink_attach_phy was made before calling phylink_sfp_config.
It also seems that in case the sfp module embeds a PHY for which there no specific driver is found,
the phy_probe function which fills the phy->supported field is called following the call to phylink_attach_phy.
So the subsequent call to phylink_sfp_config is passed with phy->supported set to 0, resulting in :
"validation with support 00,00000000,00000000 failed: -22"
Antoine proposed me the following patch as a workaround and it gets back to working like before with it:
-----------------------------------------------------
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 0f23bec431c1..737da4d146ce 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2002,6 +2002,8 @@ static bool phylink_phy_no_inband(struct phy_device *phy)
static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
{
+ unsigned long *supported, *advertising;
+ bool genphy = !phy->mdio.dev.driver;
struct phylink *pl = upstream;
phy_interface_t interface;
u8 mode;
@@ -2021,8 +2023,15 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
else
mode = MLO_AN_INBAND;
+ /* If a PHY driver has been probed use the PHY's reported capabilities;
+ * otherwise we're using genphy and the PHY capabilities won't be known
+ * before the PHY has been attached.
+ */
+ supported = genphy ? pl->sfp_support : phy->supported;
+ advertising = genphy ? pl->sfp_support : phy->advertising;
+
/* Do the initial configuration */
- ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
+ ret = phylink_sfp_config(pl, mode, supported, advertising);
if (ret < 0)
return ret;
---------------------------------------------------------
I haven't spent time digging a bit more about the consequences of such a patch and neither did he afaik.
I hope this info is valuable, thanks.
Regards,
Julien
Powered by blists - more mailing lists