lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Sep 2010 21:20:26 +0200
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Jiri Bohac <jbohac@...e.cz>
Cc:	Jay Vosburgh <fubar@...ibm.com>,
	bonding-devel@...ts.sourceforge.net, markine@...gle.com,
	chavey@...gle.com, netdev@...r.kernel.org
Subject: Re: [RFC] bonding: fix workqueue re-arming races

On Wed, Sep 01, 2010 at 09:11:06PM +0200, Jiri Bohac wrote:
> On Wed, Sep 01, 2010 at 09:00:37PM +0200, Jarek Poplawski wrote:
> > On Wed, Sep 01, 2010 at 05:37:30PM +0200, Jarek Poplawski wrote:
> > > On Wed, Sep 01, 2010 at 05:18:56PM +0200, Jarek Poplawski wrote:
> > > > On Wed, Sep 01, 2010 at 03:30:56PM +0200, Jiri Bohac wrote:
> > > > > On Wed, Sep 01, 2010 at 12:23:56PM +0000, Jarek Poplawski wrote:
> > > > > > On 2010-08-31 22:54, Jay Vosburgh wrote:
> > > > > > > 	What prevents this from deadlocking such that cpu A is in
> > > > > > > bond_close, holding RTNL and in cancel_delayed_work_sync, while cpu B is
> > > > > > > in the above function, trying to acquire RTNL?
> > > > > > 
> > > > > > I guess this one isn't cancelled in bond_close, so it should be safe.
> > > > > 
> > > > > Nah, Jay was correct. Although this work item is not explicitly
> > > > > cancelled with cancel_delayed_work_sync(), it is on the same
> > > > > workqueue as work items that are being cancelled with
> > > > > cancel_delayed_work_sync(), so this can still cause a deadlock.
> > > > > Fixed in the new version of the patch by putting these on a
> > > > > separate workqueue.
> > > > > 
> > > > 
> > > > Maybe I miss something, but the same workqueue shouldn't matter here.
> > > 
> > > Hmm... I missed your point completely and Jay was correct!
> > 
> > Hmm#2... Alas, after getting back my sobriety, I've to say that Jay
> > was wrong: the same workqueue shouldn't matter here. Similar things
> > are done by other network code with the kernel-global workqueue, eg.
> > in tg3_close(), rhine_close() etc. 
> 
> But these don't do rtnl_lock() inside the work item, do they?

Exactly. Just like work items cancelled from bond_work_cancel_all()
after your patch.

Jarek P.

> That is the main issue here: dev_close() is called with rtnl held
> and so it cannot wait for completion of work items that grab rtnl
> themselves.
> 
> -- 
> Jiri Bohac <jbohac@...e.cz>
> SUSE Labs, SUSE CZ
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ