[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7da31dea-091a-5858-d3bb-928c8546a059@omp.ru>
Date: Wed, 18 Oct 2023 20:27:13 +0300
From: Sergey Shtylyov <s.shtylyov@....ru>
To: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>,
"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-renesas-soc@...r.kernel.org" <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH net] ravb: Fix races between ravb_tx_timeout_work() and
net related ops
On 10/18/23 12:39 PM, Yoshihiro Shimoda wrote:
[...]
>>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
>>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
>>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
>>> ravb_tx_timeout_work() calls rtnl_trylock() to avoid a deadlock.
>>
>> I don't quite follow... how calling cancel_work_sync() is a problem?
>> I thought the problem was that unregister_netdev() can be called with
>> the TX timeout work still pending? And, more generally, shouldn't we
>> protect against the TX timeout work being executed on a different CPU
>> than the {net_device|ethtool}_ops methods are being executed (is that
>> possible?)?
>
> __dev_close_many() in net/core/dev.c calls ASSERT_RTNL() and ops->ndo_stop().
> So, the ravb_close() is under rtnl lock. While locking the rtnl, it's
> possible to call ravb_tx_timeout_work() on other CPU. In such a case,
> it's possible to cause a deadlock between ravb_close() and ravb_tx_timeout_work()
>
> CPU0 CPU1
> ravb_tx_timeout()
> schedule_work()
> ...
> __dev_close_many()
> // this is under rtnl lock
> ravb_close()
> cancel_work_sync()
> ravb_tx_timeout_work()
> rtnl_lock()
> // this is possible to cause a deadlock
Ah, cancel_work_sync() means we have to wait for the work to
finish -- indeed a deadlock is possiblet then.
>> I also had a suspicion that we still miss the spinlock calls in
>> ravb_tx_timeout_work() but nothing in particular jumped at me...
We mainly need to protect against the interrupts in this case...
>> mind looking into that? :-)
>
> Yes, perhaps we have to check it somehow...
Unfortunately, I don't seem to have no bandwidth to do that myself...
[...]
MBR, Sergey
Powered by blists - more mailing lists