lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 6 May 2022 00:20:25 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Francesco Dolcini <francesco.dolcini@...adex.com>
Cc:     Heiner Kallweit <hkallweit1@...il.com>,
        Russell King <linux@...linux.org.uk>,
        Joakim Zhang <qiangqing.zhang@....com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH net v1] net: phy: Fix race condition on link status change

On Thu, May 05, 2022 at 10:32:29PM +0200, Francesco Dolcini wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
> 
> [  146.195696] fec 2188000.ethernet eth0: MDIO read timeout
> [  146.201779] ------------[ cut here ]------------
> [  146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
> [  146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
> [  146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
> [  146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  146.236257]  unwind_backtrace from show_stack+0x10/0x14
> [  146.241640]  show_stack from dump_stack_lvl+0x58/0x70
> [  146.246841]  dump_stack_lvl from __warn+0xb4/0x24c
> [  146.251772]  __warn from warn_slowpath_fmt+0x5c/0xd4
> [  146.256873]  warn_slowpath_fmt from phy_error+0x24/0x6c
> [  146.262249]  phy_error from kszphy_handle_interrupt+0x40/0x48
> [  146.268159]  kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
> [  146.274417]  irq_thread_fn from irq_thread+0xf0/0x1dc
> [  146.279605]  irq_thread from kthread+0xe4/0x104
> [  146.284267]  kthread from ret_from_fork+0x14/0x28
> [  146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
> [  146.294448] 1fa0:                                     00000000 00000000 00000000 00000000
> [  146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [  146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  146.318262] irq event stamp: 12325
> [  146.321780] hardirqs last  enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
> [  146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
> [  146.338259] softirqs last  enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
> [  146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
> [  146.354447] ---[ end trace 0000000000000000 ]---
> 
> With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link()
> calls fec_stop()/fec_restart() and both these function reset and
> temporary disable the FEC disrupting any MII transaction that
> could be happening at the same time.
> 
> fec_enet_adjust_link() and phy_read() can be running at the same time
> when we have one additional interrupt before the phy_state_machine() is
> able to terminate.
> 
> Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
>                            |
>                            | phy_interrupt()            <-- PHY IRQ
> 	                   |  handle_interrupt()
> 	                   |   phy_read()
> 	                   |   phy_trigger_machine()
> 	                   |    --> schedule phylib WQ
>                            |
> 	                   |
> phy_state_machine()        |
>  phy_check_link_status()   |
>   phy_link_change()        |
>    phydev->adjust_link()   |
>     fec_enet_adjust_link() |
>      --> FEC reset         | phy_interrupt()            <-- PHY IRQ
> 	                   |  phy_read()
> 	 	           |

You have a mix of tabs and spaces here, which is why it is getting
messed up.

> 
> Fix this by acquiring the phydev lock in phy_interrupt().
> 
> Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
> cc: <stable@...r.kernel.org>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@...adex.com>

The fixes tag is not so easy. I decided on:

Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")

This is where threaded interrupts were added. Before this it was not
possible to read MDIO registers inside the interrupt handler, since
that often involves blocking operations. 

     Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ