netdev - Re: [PATCH net-next] net: phy: avoid kernel warning dump when stopping an errored PHY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <29917acb-bd80-10e5-b1ae-c844ea0e9cbb@huawei.com>
Date: Tue, 5 Sep 2023 16:49:31 +0800
From: Jijie Shao <shaojijie@...wei.com>
To: Andrew Lunn <andrew@...n.ch>
CC: <shaojijie@...wei.com>, <f.fainelli@...il.com>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <hkallweit1@...il.com>, <kuba@...nel.org>,
	<netdev@...r.kernel.org>, <pabeni@...hat.com>, <rmk+kernel@...linux.org.uk>,
	"shenjian15@...wei.com" <shenjian15@...wei.com>, "liuyonglong@...wei.com"
	<liuyonglong@...wei.com>, <wangjie125@...wei.com>, <chenhao418@...wei.com>,
	Hao Lan <lanhao@...wei.com>, "wangpeiyang1@...wei.com"
	<wangpeiyang1@...wei.com>
Subject: Re: [PATCH net-next] net: phy: avoid kernel warning dump when
 stopping an errored PHY


on 2023/9/4 21:43, Andrew Lunn wrote:
> On Mon, Sep 04, 2023 at 05:50:32PM +0800, Jijie Shao wrote:
>> Hi all,
>> We encountered an issue when resetting our netdevice recently, it seems
>> related to this patch.
>>
>> During our process, we stop phy first and call phy_start() later.
>> phy_check_link_status returns error because it read mdio failed. The
>> reason why it happened is that the cmdq is unusable when we reset and we
>> can't access to mdio.
> At what point in the flow below do you apply the reset which stops
> access to the MDIO bus? Ideally you want to do phy_stop(), then apply
> the reset, get the hardware working again, and then do a phy_start().
>

When we do a phy_stop(), hardware might be error and we can't access to
mdio.And our process is read/write mdio failed first, then do phy_stop(),
reset hardware and call phy_start() finally.

We note there are several times lock during phy_state_machine(). The first
is to handle phydev state. It's noting that a competition of phydev lock
happend again if phy_check_link_status() returns an error. Why we don't
held lock until changing state to PHY_ERROR if phy_check_link_status()
returns an error?

Jijie Shao