lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190724133721.GH1330@shell.armlinux.org.uk>
Date:   Wed, 24 Jul 2019 14:37:21 +0100
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     Arseny Solokha <asolokha@...kras.ru>
Cc:     Claudiu Manoil <claudiu.manoil@....com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [PATCH v2] net: phylink: don't start and stop SGMII PHYs in SFP
 modules twice

On Wed, Jul 24, 2019 at 08:31:39PM +0700, Arseny Solokha wrote:
> SFP modules connected using the SGMII interface have their own PHYs which
> are handled by the struct phylink's phydev field. On the other hand, for
> the modules connected using 1000Base-X interface that field is not set.
> 
> Since commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network
> devices and sfp cages") phylink_start() ends up setting the phydev field
> using the sfp-bus infrastructure, which eventually calls phy_start() on it,
> and then calling phy_start() again on the same phydev from phylink_start()
> itself. Similar call sequence holds for phylink_stop(), only in the reverse
> order. This results in WARNs during network interface bringup and shutdown
> when a copper SFP module is connected, as phy_start() and phy_stop() are
> called twice in a row for the same phy_device:
> 
>   % ip link set up dev eth0
>   ------------[ cut here ]------------
>   called from state UP
>   WARNING: CPU: 1 PID: 155 at drivers/net/phy/phy.c:895 phy_start+0x74/0xc0
>   Modules linked in:
>   CPU: 1 PID: 155 Comm: backend Not tainted 5.2.0+ #1
>   NIP:  c0227bf0 LR: c0227bf0 CTR: c004d224
>   REGS: df547720 TRAP: 0700   Not tainted  (5.2.0+)
>   MSR:  00029000 <CE,EE,ME>  CR: 24002822  XER: 00000000
> 
>   GPR00: c0227bf0 df5477d8 df5d7080 00000014 df9d2370 df9d5ac4 1f4eb000 00000001
>   GPR08: c061fe58 00000000 00000000 df5477d8 0000003c 100c8768 00000000 00000000
>   GPR16: df486a00 c046f1c8 c046eea0 00000000 c046e904 c0239604 db68449c 00000000
>   GPR24: e9083204 00000000 00000001 db684460 e9083404 00000000 db6dce00 db6dcc00
>   NIP [c0227bf0] phy_start+0x74/0xc0
>   LR [c0227bf0] phy_start+0x74/0xc0
>   Call Trace:
>   [df5477d8] [c0227bf0] phy_start+0x74/0xc0 (unreliable)
>   [df5477e8] [c023cad0] startup_gfar+0x398/0x3f4
>   [df547828] [c023cf08] gfar_enet_open+0x364/0x374
>   [df547898] [c029d870] __dev_open+0xe4/0x140
>   [df5478c8] [c029db70] __dev_change_flags+0xf0/0x188
>   [df5478f8] [c029dc28] dev_change_flags+0x20/0x54
>   [df547918] [c02ae304] do_setlink+0x310/0x818
>   [df547a08] [c02b1eb8] __rtnl_newlink+0x384/0x6b0
>   [df547c28] [c02b222c] rtnl_newlink+0x48/0x68
>   [df547c48] [c02ad7c8] rtnetlink_rcv_msg+0x240/0x27c
>   [df547c98] [c02cc068] netlink_rcv_skb+0x8c/0xf0
>   [df547cd8] [c02cba3c] netlink_unicast+0x114/0x19c
>   [df547d08] [c02cbd74] netlink_sendmsg+0x2b0/0x2c0
>   [df547d58] [c027b668] sock_sendmsg_nosec+0x20/0x40
>   [df547d68] [c027d080] ___sys_sendmsg+0x17c/0x1dc
>   [df547e98] [c027df7c] __sys_sendmsg+0x68/0x84
>   [df547ef8] [c027e430] sys_socketcall+0x1a0/0x204
>   [df547f38] [c000d1d8] ret_from_syscall+0x0/0x38
>   --- interrupt: c01 at 0xfd4e030
>       LR = 0xfd4e010
>   Instruction dump:
>   813f0188 38800000 2b890005 419d0014 3d40c046 5529103a 394aa208 7c8a482e
>   3c60c046 3863a1b8 4cc63182 4be009a1 <0fe00000> 48000030 3c60c046 3863a1d0
>   ---[ end trace d4c095aeaf6ea998 ]---
> 
> and
> 
>   % ip link set down dev eth0
>   ------------[ cut here ]------------
>   called from state HALTED
>   WARNING: CPU: 1 PID: 184 at drivers/net/phy/phy.c:858 phy_stop+0x3c/0x88
> 
>   <...>
> 
>   Call Trace:
>   [df581788] [c0228450] phy_stop+0x3c/0x88 (unreliable)
>   [df581798] [c022d548] sfp_sm_phy_detach+0x1c/0x44
>   [df5817a8] [c022e8cc] sfp_sm_event+0x4b0/0x87c
>   [df581848] [c022f04c] sfp_upstream_stop+0x34/0x44
>   [df581858] [c0225608] phylink_stop+0x7c/0xe4
>   [df581868] [c023c57c] stop_gfar+0x7c/0x94
>   [df581888] [c023c5b8] gfar_close+0x24/0x94
>   [df5818a8] [c0298688] __dev_close_many+0xdc/0xf8
>   [df5818c8] [c029db58] __dev_change_flags+0xd8/0x188
>   [df5818f8] [c029dc28] dev_change_flags+0x20/0x54
>   [df581918] [c02ae304] do_setlink+0x310/0x818
>   [df581a08] [c02b1eb8] __rtnl_newlink+0x384/0x6b0
>   [df581c28] [c02b222c] rtnl_newlink+0x48/0x68
>   [df581c48] [c02ad7c8] rtnetlink_rcv_msg+0x240/0x27c
>   [df581c98] [c02cc068] netlink_rcv_skb+0x8c/0xf0
>   [df581cd8] [c02cba3c] netlink_unicast+0x114/0x19c
>   [df581d08] [c02cbd74] netlink_sendmsg+0x2b0/0x2c0
>   [df581d58] [c027b668] sock_sendmsg_nosec+0x20/0x40
>   [df581d68] [c027d080] ___sys_sendmsg+0x17c/0x1dc
>   [df581e98] [c027df7c] __sys_sendmsg+0x68/0x84
>   [df581ef8] [c027e430] sys_socketcall+0x1a0/0x204
>   [df581f38] [c000d1d8] ret_from_syscall+0x0/0x38
> 
>   <...>
> 
>   ---[ end trace d4c095aeaf6ea999 ]---
> 
> SFP modules with the 1000Base-X interface are not affected.
> 
> Place explicit calls to phy_start() and phy_stop() before enabling or after
> disabling an attached SFP module, where phydev is not yet set (or is
> already unset), so they will be made only from the inside of sfp-bus, if
> needed.
> 
> Signed-off-by: Arseny Solokha <asolokha@...kras.ru>

Suggested-by: Russell King <rmk+kernel@...linux.org.uk>
Acked-by: Russell King <rmk+kernel@...linux.org.uk>

Dave, please merge this as a fix - it looks like it should be applied to
any kernel which also has:

217962615662 ("net: phy: warn if phy_start is called from invalid state")

i.o.w. v5.1 or later.

Thanks.

> ---
> Changes in v2:
>  - Moved phy_start() before sfp_upstream_start(), and phy_stop() after
>  sfp_upstream_stop(), and reworded the commit message accordingly.
> 
> This is a general fix and may be taken out from the driver conversion series
> and applied separately.
> ---
>  drivers/net/phy/phylink.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 5d0af041b8f9..b45862465c4d 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -990,10 +990,10 @@ void phylink_start(struct phylink *pl)
>  	}
>  	if (pl->link_an_mode == MLO_AN_FIXED && pl->get_fixed_state)
>  		mod_timer(&pl->link_poll, jiffies + HZ);
> -	if (pl->sfp_bus)
> -		sfp_upstream_start(pl->sfp_bus);
>  	if (pl->phydev)
>  		phy_start(pl->phydev);
> +	if (pl->sfp_bus)
> +		sfp_upstream_start(pl->sfp_bus);
>  }
>  EXPORT_SYMBOL_GPL(phylink_start);
>  
> @@ -1010,10 +1010,10 @@ void phylink_stop(struct phylink *pl)
>  {
>  	ASSERT_RTNL();
>  
> -	if (pl->phydev)
> -		phy_stop(pl->phydev);
>  	if (pl->sfp_bus)
>  		sfp_upstream_stop(pl->sfp_bus);
> +	if (pl->phydev)
> +		phy_stop(pl->phydev);
>  	del_timer_sync(&pl->link_poll);
>  	if (pl->link_irq) {
>  		free_irq(pl->link_irq, pl);
> -- 
> 2.22.0
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ