lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 01 Oct 2008 16:14:35 +0200
From:	Benjamin Thery <benjamin.thery@...l.net>
To:	"David S. Miller" <davem@...emloft.net>,
	Jarek Poplawski <jarkao2@...il.com>
Cc:	netdev <netdev@...r.kernel.org>,
	Daniel Lezcano <dlezcano@...ibm.com>,
	Benjamin Thery <benjamin.thery@...l.net>
Subject: [PATCH] net: deadlock during net device unregistration - V2

This is the second version of a patch aimed at fixing a deadlock that
can occur when unregistering net devices. 

This new version of the patch ensures the garbage collector 
dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
it in the netdevice notifier dst_dev_event() when receiving a 
NETDEV_UNREGISTER event.

Thanks to Jarek Poplawski for proposing this fix.

(The previous version proposed to replace in linkwatch_event()
the call to rntl_unlock() by __rtnl_lock())

Short description of the deadlock
- - - - - - - - - - - - - - - - -

When reaching netdev_wait_allrefs() some routes may still hold the 
device refcount. dst_gc_task() should 'garbage collect' them soon 
(and decrease the refcount). dst_gc_task() is scheduled with a delay in
the "events" workqueue.
But, as some linkwatch events are also pending, netdev_wait_allrefs()
schedules (without delay) a linkwatch_event() in the same workqueue.
linkwatch_event runs before dst_gc_task() and blocks in  netdev_run_todo
(called by rtnl_unlock()), blocking the whole "events" workqueue.

netdev_wait_allrefs() waiting for           dt_gc_task() waiting
dst_gc_task() to decrease the device -----> linkwatch_event() to return
refcount. It holds the mutex                and unblock the "events" 
netdev_run_todo_mutex                       workqueue
       ^                                        |
       |                                        |
       |                                        |
       |                                        |
       |        linkwatch_event() waiting for <-+
       +------- net_dev_run_todo (who called 
                netdev_wait_allrefs()) to release 
                netdev_run_todo_mutex

Long description:
- - - - - - - - -

In some circumstances, linkwatch_event() blocks the whole "events" 
workqueue while blocking in rtnl_unlock(), and prevent dst_gc_task() 
(enqueued in the same workqueue) to be run:

1. Unregister a device, the following routines are called:

-> unregister_netdev
  -> rtnl_lock
  -> unregister_netdevice
  -> rtnl_unlock
    -> netdev_run_todo
      -> netdev_wait_allrefs

2. In netdev_wait_allrefs(), the device's refcount is greater than 0
  because there are still some routes to be garbage collected later.

3. Also, some link watch events are pending. netdev_wait_allrefs()
  will run the linkwatch event queue, calls linkwatch_run_queue().


Both the route garbage collector dst_gc_task() and the linkwatch task
linkwatch_event() are queued in the same generic workqueue: "events".


4. linkwatch_event() is enqueued earlier in the queue. It will grab
  rtnl_lock(), deliver the link watch events pending, and then call
  rtnl_unlock(). 
  rtnl_unlock() will then call netdev_run_todo() and block on
  mutex_lock(&net_todo_run_mutex).

  At this point, the workqueue "events" is _blocked_ until the
  netdev_wait_allrefs() call above returns when the device refcount 
  reaches 0.

  Problem: it will never happens if dst_gc_task() was enqueued behind
  linkwatch_event() in the "events" workqueue as the queue is now 
  blocked.


Thread foo                            Thread "events"
----------                            ---------------

free IPv6 routes                    |
-> schedule_delayed_work(dst_gc_task)
...                                 |
unregister_netdev                   |
 -> rtln_unlock                     | 
   -> netdev_run_todo               | ...
     -> grab net_todo_run_mutex     |
     -> netdev_wait_allrefs (loop)  | -> run__workqueue()
        ->  linkwatch_run_queue()   | ....
            enqueue linkwatch_event |
            in "events"             |
                                    | -> run_workqueue()
        "BLOCKED"                   |   -> linkwatch_event()
                                    |     -> rtnl_lock()
                                    |     -> __linkwatch_run_queue 
                                    |     ->rtnl_unlock()
                                    |       -> netdev_run_todo
                                    |          -> try to grab net_todo_run_mutex
                                               BLOCKED

Deadlock: 
  
# BLOCKED waiting for dst_gc_task()  # BLOCKED waiting for netdev_wait_allrefs
to decrement the device refcnt         to release net_todo_run_mutex

                  # dst_gst_task() waiting to be run by "events"


This deadlock can be observed sometimes when the loopback is 
unregistered when a network namespaces exits.

Benjamin

Signed-off-by: Benjamin Thery <benjamin.thery@...l.net>
Acked-by: Daniel Lezcano <dlezcano@...ibm.com>
---
 net/core/dst.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: net-next-2.6/net/core/dst.c
===================================================================
--- net-next-2.6.orig/net/core/dst.c
+++ net-next-2.6/net/core/dst.c
@@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
 			dst_ifdown(dst, dev, event != NETDEV_DOWN);
 		}
 		mutex_unlock(&dst_gc_mutex);
+
+		if (event == NETDEV_UNREGISTER &&
+		    cancel_delayed_work(&dst_gc_work))
+			dst_gc_task(&dst_gc_work.work);
 		break;
 	}
 	return NOTIFY_DONE;

-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ