netdev - Re: [PATCH net] ravb: Fix races between ravb_tx_timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7da31dea-091a-5858-d3bb-928c8546a059@omp.ru>
Date: Wed, 18 Oct 2023 20:27:13 +0300
From: Sergey Shtylyov <s.shtylyov@....ru>
To: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>,
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-renesas-soc@...r.kernel.org" <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH net] ravb: Fix races between ravb_tx_timeout_work() and
 net related ops

On 10/18/23 12:39 PM, Yoshihiro Shimoda wrote:
[...]
>>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
>>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
>>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
>>> ravb_tx_timeout_work() calls rtnl_trylock() to avoid a deadlock.
>>
>>    I don't quite follow... how calling cancel_work_sync() is a problem?
>> I thought the problem was that unregister_netdev() can be called with
>> the TX timeout work still pending? And, more generally, shouldn't we
>> protect against the TX timeout work being executed on a different CPU
>> than the {net_device|ethtool}_ops methods are being executed (is that
>> possible?)?
> 
> __dev_close_many() in net/core/dev.c calls ASSERT_RTNL() and ops->ndo_stop().
> So, the ravb_close() is under rtnl lock. While locking the rtnl, it's
> possible to call ravb_tx_timeout_work() on other CPU. In such a case,
> it's possible to cause a deadlock between ravb_close() and ravb_tx_timeout_work()
> 
> CPU0				CPU1
> 				ravb_tx_timeout()
> 				schedule_work()
> ...
> __dev_close_many()
> // this is under rtnl lock
> ravb_close()
> cancel_work_sync()
> 				ravb_tx_timeout_work()
> 				rtnl_lock()
> 				// this is possible to cause a deadlock

   Ah, cancel_work_sync() means we have to wait for the work to
finish -- indeed a deadlock is possiblet then. 
>>    I also had a suspicion that we still miss the spinlock calls in
>> ravb_tx_timeout_work() but nothing in particular jumped at me...

   We mainly need to protect against the interrupts in this case...

>> mind looking into that? :-)
> 
> Yes, perhaps we have to check it somehow...

   Unfortunately, I don't seem to have no bandwidth to do that myself...

[...]

MBR, Sergey