lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080610.224023.116817726.davem@davemloft.net>
Date:	Tue, 10 Jun 2008 22:40:23 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	johannes@...solutions.net
Cc:	davej@...emonkey.org.uk, netdev@...r.kernel.org
Subject: Re: 2.6.25rc7 lockdep trace

From: Johannes Berg <johannes@...solutions.net>
Date: Fri, 04 Apr 2008 16:48:12 +0200

I'm attempting to bring solving this problem back from the dead.

Please help!

> On Thu, 2008-04-03 at 13:48 -0700, David Miller wrote:
> > From: Johannes Berg <johannes@...solutions.net>
> > Date: Sat, 29 Mar 2008 11:02:28 +0100
> > 
> > > However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> > > while holding the RTNL because it doesn't need any runqueue lock.
> > 
> > So in theory we should be able to safely transform
> > flush_scheduled_work() calls in network driver close
> > methods into cancel_work_sync()?
> 
> Yes, more precisely cancel_work_sync() for each work struct the driver
> uses, unless the driver actually requires that the work runs before it
> goes down.

Ok, I did an audit of all the drivers under drivers/net that invoke
flush_scheduled_work().  Here is my audit analysis and the patch I
came up with.  The only deadlock'y case I couldn't fix right now is
the cassini driver, see below for details and the patch:

8139too: Calls flush_scheduled_work() in device remove method, OK.
libertas: Calls flush_scheduled_work() in device probe and remove methods, OK.
bnx2: Uses special polling mdelay() sequence instead of calling
      flush_scheduled_work() in close method, converted to
      cancel_work_sync().
cassini: Does flush_scheduled_work() from ->change_mtu() method, which
	 also holds RTNL semaphore.  Seems tricky to fix as it's trying
	 to schedule work and then immediately wait for it to
	 complete synchronously before returning from the ->change_mtu
	 method.  Maybe calling cas_reset_task() directly would work,
	 but this could conflict with reset tasks scheduled in other
	 contexts.  NOT FIXED
e1000e: Does flush_scheduled_work() from device remove method, OK.
ehea_main: Does flush_scheduled_work() in ->stop method, converted to
	   cancel_work_sync().
baycom_epp: Does flush_scheduled_work() from ->stop method, converted
	    to cancel_delayed_work()/cancel_work_sync() sequence.
ibm_newemac: Calls flush_scheduled_work() from device remove method, OK.
igb: Calls flush_scheduled_work() from device remove method, OK.
irda/mcs7780: Calls flush_scheduled_work() from USB disconnect, OK.
iseries_veth: Calls flush_scheduled_work() from device disconnect and module
	      exit, OK.
ixgbe: Calls flush_scheduled_work() from device remove method, OK.
mv643xx_eth: Calls flush_scheduled_work() from device remove method, OK.
myri10ge: Calls flush_scheduled_work() from device remove method, OK.
netxen: Has it's own hand-grown scheme to work around the flush_scheduled_work()
	deadlock.  Using a workqueue and flush_workqueue().
niu: Calls flush_scheduled_work() only from suspend handler, OK.
phy/phy.c: Avoids flush_scheduled_work() calls to avoid deadlock, uses
	   cancel_work_sync() instead.
r8169: Calls flush_scheduled_work() from device remove method, OK.
s2io: Calls flush_scheduled_work() from device remove method, OK.
sis190: Calls flush_scheduled_work() from device remove method, OK.
skge.c: Calls flush_scheduled_work() from device remove method, OK.
smc911x: Uses a similar polling loop like bnx2 did, converted to
	   cancel_work_sync().
smc91x: Same as smc911x
sungem: Calls flush_scheduled_work() only from suspend and device
	remove method, OK.
tg3: Calls flush_scheduled_work() only from suspend and device remove
     method, OK.
tulip_core: Calls flush_scheduled_work() from tulip_down(), converted to
	    cancel_work_sync().
kaweth: Calls flush_scheduled_work() from kaweth_kill_urbs() which can
	run from the ->stop() handler, converted to
	cancel_delayed_work/cancel_work_sync sequence.
usbnet:	Only calls flush_scheduled_work() from disconnect handler, OK.
hostap: Calls flush_scheduled_work() from ->stop, converted to
	cancel_work_sync()

Signed-off-by: David S. Miller <davem@...emloft.net>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4b46e68..48ed410 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5907,12 +5907,7 @@ bnx2_close(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	u32 reset_code;
 
-	/* Calling flush_scheduled_work() may deadlock because
-	 * linkwatch_event() may be on the workqueue and it will try to get
-	 * the rtnl_lock which we are holding.
-	 */
-	while (bp->in_reset_task)
-		msleep(1);
+	cancel_work_sync(&bp->reset_task);
 
 	bnx2_disable_int_sync(bp);
 	bnx2_napi_disable(bp);
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index faae01d..075fd54 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2605,7 +2605,8 @@ static int ehea_stop(struct net_device *dev)
 	if (netif_msg_ifdown(port))
 		ehea_info("disabling port %s", dev->name);
 
-	flush_scheduled_work();
+	cancel_work_sync(&port->reset_task);
+
 	mutex_lock(&port->port_lock);
 	netif_stop_queue(dev);
 	port_napi_disable(port);
diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
index dde9c7e..b36ae41 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -959,7 +959,8 @@ static int epp_close(struct net_device *dev)
 	unsigned char tmp[1];
 
 	bc->work_running = 0;
-	flush_scheduled_work();
+	cancel_delayed_work(&bc->run_work);
+	cancel_work_sync(&bc->run_work.work);
 	bc->stat = EPP_DCDBIT;
 	tmp[0] = 0;
 	pp->ops->epp_write_addr(pp, tmp, 1, 0);
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index 4e28002..5c71098 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -1531,16 +1531,8 @@ static int smc911x_close(struct net_device *dev)
 	if (lp->phy_type != 0) {
 		/* We need to ensure that no calls to
 		 * smc911x_phy_configure are pending.
-
-		 * flush_scheduled_work() cannot be called because we
-		 * are running with the netlink semaphore held (from
-		 * devinet_ioctl()) and the pending work queue
-		 * contains linkwatch_event() (scheduled by
-		 * netif_carrier_off() above). linkwatch_event() also
-		 * wants the netlink semaphore.
 		 */
-		while (lp->work_pending)
-			schedule();
+		cancel_work_sync(&lp->phy_configure);
 		smc911x_phy_powerdown(dev, lp->mii.phy_id);
 	}
 
diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c
index a188e33..5dbe4d3 100644
--- a/drivers/net/smc91x.c
+++ b/drivers/net/smc91x.c
@@ -1016,15 +1016,8 @@ static void smc_phy_powerdown(struct net_device *dev)
 
 	/* We need to ensure that no calls to smc_phy_configure are
 	   pending.
-
-	   flush_scheduled_work() cannot be called because we are
-	   running with the netlink semaphore held (from
-	   devinet_ioctl()) and the pending work queue contains
-	   linkwatch_event() (scheduled by netif_carrier_off()
-	   above). linkwatch_event() also wants the netlink semaphore.
 	*/
-	while(lp->work_pending)
-		yield();
+	cancel_work_sync(&lp->phy_configure);
 
 	bmcr = smc_phy_read(dev, phy, MII_BMCR);
 	smc_phy_write(dev, phy, MII_BMCR, bmcr | BMCR_PDOWN);
diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 55670b5..af8d2c4 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -731,7 +731,7 @@ static void tulip_down (struct net_device *dev)
 	void __iomem *ioaddr = tp->base_addr;
 	unsigned long flags;
 
-	flush_scheduled_work();
+	cancel_work_sync(&tp->media_work);
 
 #ifdef CONFIG_TULIP_NAPI
 	napi_disable(&tp->napi);
diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index 0dcfc03..d776bcf 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -706,7 +706,8 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
 	usb_kill_urb(kaweth->rx_urb);
 	usb_kill_urb(kaweth->tx_urb);
 
-	flush_scheduled_work();
+	cancel_delayed_work(&kaweth->lowmem_work);
+	cancel_work_sync(&kaweth->lowmem_work.work);
 
 	/* a scheduled work may have resubmitted,
 	   we hit them again */
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index 20d387f..57b800f 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -682,7 +682,11 @@ static int prism2_close(struct net_device *dev)
 		netif_device_detach(dev);
 	}
 
-	flush_scheduled_work();
+	cancel_work_sync(&local->reset_queue);
+	cancel_work_sync(&local->set_multicast_list_queue);
+	cancel_work_sync(&local->set_tim_queue);
+	cancel_work_sync(&local->info_queue);
+	cancel_work_sync(&local->comms_qual_update);
 
 	module_put(local->hw_module);
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ