[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<OSYPR01MB533414034BFE166BA7344025D8D5A@OSYPR01MB5334.jpnprd01.prod.outlook.com>
Date: Wed, 18 Oct 2023 09:39:18 +0000
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>
To: Sergey Shtylyov <s.shtylyov@....ru>, "davem@...emloft.net"
<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-renesas-soc@...r.kernel.org" <linux-renesas-soc@...r.kernel.org>
Subject: RE: [PATCH net] ravb: Fix races between ravb_tx_timeout_work() and
net related ops
Hello Sergey,
> From: Sergey Shtylyov, Sent: Wednesday, October 18, 2023 3:59 AM
>
> Hello!
>
> On 10/17/23 11:53 AM, Yoshihiro Shimoda wrote:
>
> > Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> > since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> > ravb_tx_timeout_work() calls rtnl_trylock() to avoid a deadlock.
>
> I don't quite follow... how calling cancel_work_sync() is a problem?
> I thought the problem was that unregister_netdev() can be called with
> the TX timeout work still pending? And, more generally, shouldn't we
> protect against the TX timeout work being executed on a different CPU
> than the {net_device|ethtool}_ops methods are being executed (is that
> possible?)?
__dev_close_many() in net/core/dev.c calls ASSERT_RTNL() and ops->ndo_stop().
So, the ravb_close() is under rtnl lock. While locking the rtnl, it's
possible to call ravb_tx_timeout_work() on other CPU. In such a case,
it's possible to cause a deadlock between ravb_close() and ravb_tx_timeout_work()
CPU0 CPU1
ravb_tx_timeout()
schedule_work()
...
__dev_close_many()
// this is under rtnl lock
ravb_close()
cancel_work_sync()
ravb_tx_timeout_work()
rtnl_lock()
// this is possible to cause a deadlock
> I also had a suspicion that we still miss the spinlock calls in
> ravb_tx_timeout_work() but nothing in particular jumped at me...
> mind looking into that? :-)
Yes, perhaps we have to check it somehow...
> > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>
> > ---
> > drivers/net/ethernet/renesas/ravb_main.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index 0ef0b88b7145..b53533ab4599 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> [...]
> > @@ -1907,6 +1910,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> > */
> > netdev_err(ndev, "%s: ravb_dmac_init() failed, error %d\n",
> > __func__, error);
> > + rtnl_unlock();
> > return;
>
> Perhaps *goto* instead here?
...
> > }
> > ravb_emac_init(ndev);
> > @@ -1917,6 +1921,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> > ravb_ptp_init(ndev, priv->pdev);
> >
> > netif_tx_start_all_queues(ndev);
>
> ... and add label here?
I got it. Using goto is better, I think.
Best regards,
Yoshihiro Shimoda
> > + rtnl_unlock();
> > }
>
> MBR, Sergey
Powered by blists - more mailing lists