netdev - RE: [PATCH net] ravb: Fix races between ravb_tx_timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID:
 <TYBPR01MB534180A2BCEA7BCC5E6D3A98D8D4A@TYBPR01MB5341.jpnprd01.prod.outlook.com>
Date: Thu, 19 Oct 2023 00:40:19 +0000
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>
To: Sergey Shtylyov <s.shtylyov@....ru>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-renesas-soc@...r.kernel.org" <linux-renesas-soc@...r.kernel.org>
Subject: RE: [PATCH net] ravb: Fix races between ravb_tx_timeout_work() and
 net related ops

Hello Sergey,

> From: Sergey Shtylyov, Sent: Thursday, October 19, 2023 2:27 AM
> 
> On 10/18/23 12:39 PM, Yoshihiro Shimoda wrote:
> [...]
> >>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> >>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> >>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> >>> ravb_tx_timeout_work() calls rtnl_trylock() to avoid a deadlock.
> >>
> >>    I don't quite follow... how calling cancel_work_sync() is a problem?
> >> I thought the problem was that unregister_netdev() can be called with
> >> the TX timeout work still pending? And, more generally, shouldn't we
> >> protect against the TX timeout work being executed on a different CPU
> >> than the {net_device|ethtool}_ops methods are being executed (is that
> >> possible?)?
> >
> > __dev_close_many() in net/core/dev.c calls ASSERT_RTNL() and ops->ndo_stop().
> > So, the ravb_close() is under rtnl lock. While locking the rtnl, it's
> > possible to call ravb_tx_timeout_work() on other CPU. In such a case,
> > it's possible to cause a deadlock between ravb_close() and ravb_tx_timeout_work()
> >
> > CPU0				CPU1
> > 				ravb_tx_timeout()
> > 				schedule_work()
> > ...
> > __dev_close_many()
> > // this is under rtnl lock
> > ravb_close()
> > cancel_work_sync()
> > 				ravb_tx_timeout_work()
> > 				rtnl_lock()
> > 				// this is possible to cause a deadlock
> 
>    Ah, cancel_work_sync() means we have to wait for the work to
> finish -- indeed a deadlock is possiblet then.

Thank you for your reply. I'll update commit description in v2.

> >>    I also had a suspicion that we still miss the spinlock calls in
> >> ravb_tx_timeout_work() but nothing in particular jumped at me...
> 
>    We mainly need to protect against the interrupts in this case...

I think so. However, we can not use spin_lock_irqsave() for whole this
ravb_tx_timeout_work() because ravb_ring_init() calls kcalloc() with GFP_KERNEL.

> >> mind looking into that? :-)
> >
> > Yes, perhaps we have to check it somehow...
> 
>    Unfortunately, I don't seem to have no bandwidth to do that myself...

I got it. I'll investigate this later.

Best regards,
Yoshihiro Shimoda

> [...]
> 
> MBR, Sergey