[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <134f69de-64f9-4d36-94ff-22b93cb32f2e@bp.renesas.com>
Date: Tue, 21 Jan 2025 11:34:48 +0000
From: Paul Barker <paul.barker.ct@...renesas.com>
To: Kory Maincent <kory.maincent@...tlin.com>,
Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
thomas.petazzoni@...tlin.com, Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>, Russell King
<linux@...linux.org.uk>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Niklas Söderlund <niklas.soderlund@...natech.se>,
Sergey Shtylyov <s.shtylyov@....ru>
Subject: Re: [PATCH net-next v3] net: phy: Fix suspicious rcu_dereference
usage
On 21/01/2025 09:38, Kory Maincent wrote:
> On Mon, 20 Jan 2025 11:12:28 -0800
> Jakub Kicinski <kuba@...nel.org> wrote:
>
>> On Mon, 20 Jan 2025 15:19:25 +0100 Kory Maincent wrote:
>>> The path reported to not having RTNL lock acquired is the suspend path of
>>> the ravb MAC driver. Without this fix we got this warning:
>>
>> I maintain that ravb is buggy, plenty of drivers take rtnl_lock
>> from the .suspend callback. We need _some_ write protection here,
>> the patch as is only silences a legitimate warning.
>
> Indeed if the suspend path is buggy we should fix it. Still there is lots of
> ethernet drivers calling phy_disconnect without rtnl (IIUC) if probe return an
> error or in the remove path. What should we do about it?
>
> About ravb suspend, I don't have the board, Claudiu could you try this instead
> of the current fix:
>
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c
> b/drivers/net/ethernet/renesas/ravb_main.c index bc395294a32d..c9a0d2d6f371
> 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -3215,15 +3215,22 @@ static int ravb_suspend(struct device *dev)
> if (!netif_running(ndev))
> goto reset_assert;
>
> + rtnl_lock();
> netif_device_detach(ndev);
>
> - if (priv->wol_enabled)
> - return ravb_wol_setup(ndev);
> + if (priv->wol_enabled) {
> + ret = ravb_wol_setup(ndev);
> + rtnl_unlock();
> + return ret;
> + }
>
> ret = ravb_close(ndev);
> - if (ret)
> + if (ret) {
> + rtnl_unlock();
> return ret;
> + }
>
> + rtnl_unlock();
> ret = pm_runtime_force_suspend(&priv->pdev->dev);
> if (ret)
> return ret;
>
> Regards,
(Cc'ing Niklas and Sergey as this relates to the ravb driver)
Why do we need to hold the rtnl mutex across the calls to
netif_device_detach() and ravb_wol_setup()?
My reading of Documentation/networking/netdevices.rst is that the rtnl
mutex is held when the net subsystem calls the driver's ndo_stop method,
which in our case is ravb_close(). So, we should take the rtnl mutex
when we call ravb_close() directly, in both ravb_suspend() and
ravb_wol_restore(). That would ensure that we do not call
phy_disconnect() without holding the rtnl mutex and should fix this
issue.
Commit 35f7cad1743e ("net: Add the possibility to support a selected
hwtstamp in netdevice") may have unearthed the issue, but the fixes tag
should point to the commits adding those unlocked calls to ravb_close().
I am not super familiar with the rtnl lock so let me know if I've missed
something.
Thanks,
--
Paul Barker
Download attachment "OpenPGP_0x27F4B3459F002257.asc" of type "application/pgp-keys" (3521 bytes)
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)
Powered by blists - more mailing lists