lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 4 Oct 2008 09:42:48 +0200
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Benjamin Thery <Benjamin.Thery@...l.net>
Cc:	Daniel Lezcano <dlezcano@...ibm.com>,
	"David S. Miller" <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] net: deadlock during net device unregistration - V2

On Thu, Oct 02, 2008 at 10:34:35PM +0200, Jarek Poplawski wrote:
> On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery  wrote:
> > Quoting Jarek Poplawski <jarkao2@...il.com>:
> ...
> >> As a matter of fact I've thought about something even simpler, which
> >> probably should help for all above concerns too:
> >>
> >> 	if (event == NETDEV_UNREGISTER)
> >>   	        dst_gc_task(&dst_gc_work.work);
> >>
> >> dst_gc_task() locking allows for this, and running this two times in
> >> a row could be even faster than trying to cancel the unnecessary run.
> >
> > I've thought a bit more about my last proposal and come to the same
> > conclusion as you, hmm, almost. I thought we could call
> > cancel_delayed_work() unconditionally and then dst_gc_task().
> >
> >  	if (event == NETDEV_UNREGISTER) {
> >                 cancel_delayed_work(&dst_gc_work);
> >    	        dst_gc_task(&dst_gc_work.work);
> >         }
> >
> > But, you're right, calling only dst_gc_task() seems fine to me.
> 
> I'm OK with any of these versions.
> 
> >
> > I'll run some tests tomorrow and send a new patch.
> >
> > Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
> > is better/more appropriate than the first one (replacing rtnl_unlock()
> > by the non-blocking __rtnl_unlock() in linkwatch_event())?

Hmm... You can kill me, but after more looking at this I've changed my
mind.

> I agree it's more appropriate, because:
> a) the job is done according to the rules (in the comment) during
>    the notification and not in some future,

But, since the rules are wrong or protocols buggy, the solution isn't
effective: there should be no deadlock after this, but it's also hard
to predict how many dst_gc_task() runs are needed to free all refs.
During this all other works would be unnecessarily blocked by the
linkwatch, what looks a bit too hoggish.

My current idea is we should move linkwatch or dst_gc_task() (or both)
to it's own workqueue.

> b) it removes some hidden dependencies between processes/works, which,
>    even if they currently don't exist except this one in the linkwatch,
>    could be extremly hard to diagnose, if added accidentally in the
>    future.
> 
> On the other hand, until we are not sure of the reasons, why something

BTW, sorry, of course:
"On the other hand, until we are sure of the reasons, why something"

> like this (the full destroying) was avoided in the past, and until it's
> heavily tested (with lockdep), your first patch looks to me much safer
> if applying to any -stable is considered.

And this first patch looks better and better each time...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists