linux-kernel - RE: [PATCH iwl-net] e1000: Move cancel_work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CO1PR11MB508906CCB79542A1531B506AD662A@CO1PR11MB5089.namprd11.prod.outlook.com>
Date: Mon, 2 Jun 2025 21:32:48 +0000
From: "Keller, Jacob E" <jacob.e.keller@...el.com>
To: "Damato, Joe" <jdamato@...tly.com>, Jakub Kicinski <kuba@...nel.org>
CC: Stanislav Fomichev <stfomichev@...il.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "john.cs.hey@...il.com" <john.cs.hey@...il.com>,
	"syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com"
	<syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "Kitszel, Przemyslaw"
	<przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Paolo
 Abeni" <pabeni@...hat.com>, "moderated list:INTEL ETHERNET DRIVERS"
	<intel-wired-lan@...ts.osuosl.org>, open list <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH iwl-net] e1000: Move cancel_work_sync to avoid deadlock



> -----Original Message-----
> From: Joe Damato <jdamato@...tly.com>
> Sent: Monday, June 2, 2025 1:32 PM
> To: Jakub Kicinski <kuba@...nel.org>
> Cc: Stanislav Fomichev <stfomichev@...il.com>; netdev@...r.kernel.org;
> john.cs.hey@...il.com; Keller, Jacob E <jacob.e.keller@...el.com>;
> syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@...el.com>; Andrew Lunn <andrew+netdev@...n.ch>; David
> S. Miller <davem@...emloft.net>; Eric Dumazet <edumazet@...gle.com>;
> Paolo Abeni <pabeni@...hat.com>; moderated list:INTEL ETHERNET DRIVERS
> <intel-wired-lan@...ts.osuosl.org>; open list <linux-kernel@...r.kernel.org>
> Subject: Re: [PATCH iwl-net] e1000: Move cancel_work_sync to avoid deadlock
> 
> On Fri, May 30, 2025 at 06:31:40PM -0700, Jakub Kicinski wrote:
> > On Fri, 30 May 2025 12:45:13 -0700 Joe Damato wrote:
> > > > nit: as Jakub mentioned in another thread, it seems more about the
> > > > flush_work waiting for the reset_task to complete rather than
> > > > wq mutexes (which are fake)?
> > >
> > > Hm, I probably misunderstood something. Also, not sure what you
> > > meant by the wq mutexes being fake?
> > >
> > > My understanding (which is prob wrong) from the syzbot and user
> > > report was that the order of wq mutex and rtnl are inverted in the
> > > two paths, which can cause a deadlock if both paths run.
> >
> > Take a look at touch_work_lockdep_map(), theres nosaj thing as wq mutex.
> > It's just a lockdep "annotation" that helps lockdep connect the dots
> > between waiting thread and the work item, not a real mutex. So the
> > commit msg may be better phrased like this (modulo the lines in front):
> >
> >    CPU 0:
> >   , - RTNL is held
> >  /  - e1000_close
> >  |  - e1000_down
> >  +- - cancel_work_sync (cancel / wait for e1000_reset_task())
> >  |
> >  | CPU 1:
> >  |  - process_one_work
> >   \ - e1000_reset_task
> >    `- take RTNL
> 
> OK, I'll resubmit shortly with the following commit message:
> 
>     e1000: Move cancel_work_sync to avoid deadlock
> 
>     Previously, e1000_down called cancel_work_sync for the e1000 reset task
>     (via e1000_down_and_stop), which takes RTNL.
> 
>     As reported by users and syzbot, a deadlock is possible in the following
>     scenario:
> 
>     CPU 0:
>       - RTNL is held
>       - e1000_close
>       - e1000_down
>       - cancel_work_sync (cancel / wait for e1000_reset_task())
> 
>     CPU 1:
>       - process_one_work
>       - e1000_reset_task
>       - take RTNL
> 
>     To remedy this, avoid calling cancel_work_sync from e1000_down
>     (e1000_reset_task does nothing if the device is down anyway). Instead,
>     call cancel_work_sync for e1000_reset_task when the device is being
>     removed.

Acked-by: Jacob Keller <jacob.e.keller@...el.com>