netdev - Re: [PATCH net-next 1/7] net: phy: always call phy_process_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <42ef8c8f-2fc0-a210-969b-7b0d648d8226@samsung.com>
Date: Mon, 18 Sep 2023 14:33:04 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: "Russell King (Oracle)" <rmk+kernel@...linux.org.uk>, Andrew Lunn
	<andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>
Cc: chenhao418@...wei.com, "David S. Miller" <davem@...emloft.net>, Eric
	Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Jijie Shao
	<shaojijie@...wei.com>, lanhao@...wei.com, liuyonglong@...wei.com,
	netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
	shenjian15@...wei.com, wangjie125@...wei.com, wangpeiyang1@...wei.com
Subject: Re: [PATCH net-next 1/7] net: phy: always call
 phy_process_state_change() under lock

Hi Russell,

On 14.09.2023 17:35, Russell King (Oracle) wrote:
> phy_stop() calls phy_process_state_change() while holding the phydev
> lock, so also arrange for phy_state_machine() to do the same, so that
> this function is called with consistent locking.
>
> Tested-by: Jijie Shao <shaojijie@...wei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>

This change, merged to linux-next as commit 8da77df649c4 ("net: phy: 
always call phy_process_state_change() under lock") introduces the 
following deadlock with ASIX AX8817X USB driver:

--->8---

asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY 
[usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
Asix Electronics AX88772A usb-001:003:10: attached PHY driver 
(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX 
AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb

asix 1-1.4:1.0 eth0: configuring for phy/internal link mode

============================================
WARNING: possible recursive locking detected
6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
--------------------------------------------
kworker/3:3/71 is trying to acquire lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38

but task is already holding lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&dev->lock);
   lock(&dev->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

3 locks held by kworker/3:3/71:
  #0: c1c090a8 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: 
process_one_work+0x148/0x608
  #1: f0bddf20 
((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: 
process_one_work+0x148/0x608
  #2: c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8

stack backtrace:
CPU: 3 PID: 71 Comm: kworker/3:3 Not tainted 
6.6.0-rc1-00239-g8da77df649c4-dirty #13949
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_power_efficient phy_state_machine
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __lock_acquire+0x1300/0x2984
  __lock_acquire from lock_acquire+0x130/0x37c
  lock_acquire from __mutex_lock+0x94/0x94c
  __mutex_lock from mutex_lock_nested+0x1c/0x24
  mutex_lock_nested from phy_start_aneg+0x1c/0x38
  phy_start_aneg from phy_state_machine+0x10c/0x2b8
  phy_state_machine from process_one_work+0x204/0x608
  process_one_work from worker_thread+0x1e0/0x498
  worker_thread from kthread+0x104/0x138
  kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0bddfb0 to 0xf0bddff8)
...

-------

This probably need to be fixed somewhere in drivers/net/usb/asix* but at 
the first glance I don't see any obvious place that need a fix.

> ---
>   drivers/net/phy/phy.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index df54c137c5f5..1e5218935eb3 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -1506,6 +1506,7 @@ void phy_state_machine(struct work_struct *work)
>   	if (err < 0)
>   		phy_error_precise(phydev, func, err);
>   
> +	mutex_lock(&phydev->lock);
>   	phy_process_state_change(phydev, old_state);
>   
>   	/* Only re-schedule a PHY state machine change if we are polling the
> @@ -1516,7 +1517,6 @@ void phy_state_machine(struct work_struct *work)
>   	 * state machine would be pointless and possibly error prone when
>   	 * called from phy_disconnect() synchronously.
>   	 */
> -	mutex_lock(&phydev->lock);
>   	if (phy_polling_mode(phydev) && phy_is_started(phydev))
>   		phy_queue_state_machine(phydev, PHY_STATE_TIME);
>   	mutex_unlock(&phydev->lock);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland