[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220506101031.GA417678@francesco-nb.int.toradex.com>
Date: Fri, 6 May 2022 12:10:31 +0200
From: Francesco Dolcini <francesco.dolcini@...adex.com>
To: stable@...r.kernel.org, Sasha Levin <sashal@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Francesco Dolcini <francesco.dolcini@...adex.com>,
Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Joakim Zhang <qiangqing.zhang@....com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net v2] net: phy: Fix race condition on link status change
Hello,
I have one question about the process of patch/fixes backporting
to old stable kernel.
Assuming the following patch will be deemed correct and applied it will
have to be eventually backported.
This patch will apply cleanly on v5.15+, but some work is needed to
backport to v5.10 or older.
Should I send a patch for the older kernels once the fix is merged?
Reading Documentation/process/stable-kernel-rules.rst it was not clear
to me what to do in this "mixed" situations.
Thanks a lot,
Francesco
On Fri, May 06, 2022 at 08:08:15AM +0200, Francesco Dolcini wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
>
> [ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout
> [ 146.201779] ------------[ cut here ]------------
> [ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
> [ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
> [ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
> [ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 146.236257] unwind_backtrace from show_stack+0x10/0x14
> [ 146.241640] show_stack from dump_stack_lvl+0x58/0x70
> [ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c
> [ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4
> [ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c
> [ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48
> [ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
> [ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc
> [ 146.279605] irq_thread from kthread+0xe4/0x104
> [ 146.284267] kthread from ret_from_fork+0x14/0x28
> [ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
> [ 146.294448] 1fa0: 00000000 00000000 00000000 00000000
> [ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 146.318262] irq event stamp: 12325
> [ 146.321780] hardirqs last enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
> [ 146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
> [ 146.338259] softirqs last enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
> [ 146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
> [ 146.354447] ---[ end trace 0000000000000000 ]---
>
> With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link()
> calls fec_stop()/fec_restart() and both these function reset and
> temporary disable the FEC disrupting any MII transaction that
> could be happening at the same time.
>
> fec_enet_adjust_link() and phy_read() can be running at the same time
> when we have one additional interrupt before the phy_state_machine() is
> able to terminate.
>
> Thread 1 (phylib WQ) | Thread 2 (phy interrupt)
> |
> | phy_interrupt() <-- PHY IRQ
> | handle_interrupt()
> | phy_read()
> | phy_trigger_machine()
> | --> schedule phylib WQ
> |
> |
> phy_state_machine() |
> phy_check_link_status() |
> phy_link_change() |
> phydev->adjust_link() |
> fec_enet_adjust_link() |
> --> FEC reset | phy_interrupt() <-- PHY IRQ
> | phy_read()
> |
>
> Fix this by acquiring the phydev lock in phy_interrupt().
>
> Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
> Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
> cc: <stable@...r.kernel.org>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@...adex.com>
> ---
> drivers/net/phy/phy.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index beb2b66da132..f122026c4682 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
> {
> struct phy_device *phydev = phy_dat;
> struct phy_driver *drv = phydev->drv;
> + irqreturn_t ret;
>
> - return drv->handle_interrupt(phydev);
> + mutex_lock(&phydev->lock);
> + ret = drv->handle_interrupt(phydev);
> + mutex_unlock(&phydev->lock);
> +
> + return ret;
> }
>
> /**
> --
> 2.25.1
>
Powered by blists - more mailing lists