lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Sep 2013 14:48:19 -0700
From:	Francesco Ruggeri <fruggeri@...stanetworks.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jiri Pirko <jiri@...nulli.us>,
	Alexander Duyck <alexander.h.duyck@...el.com>,
	Cong Wang <amwang@...hat.com>, netdev@...r.kernel.org
Subject: Re: [PATCH 1/1] net: race condition when removing virtual net_device

On Thu, Sep 12, 2013 at 1:06 PM, Eric W. Biederman
<ebiederm@...ssion.com> wrote:
> Francesco Ruggeri <fruggeri@...stanetworks.com> writes:
>
>> Resending.
>
> To summarize:
>
> In netdev_run_todo, netdev_wait_allrefs, call_netdevice_notifiers you
> have observed a situation where dev_net(dev) was an invalid pointer.
>

Correct.


> My first impression is that we probably just want to remove the repeated
> call to call_netdevice_notifiers, that happens after 1 second of
> waiting.
>
> Your suggested patch below doesn't have a prayer of working, as it only
> decreases the device reference count after the loop to wait for the
> reference count to go to zero.
>

Actually it did work. In 3.4 it avoided the crash in
inetpeer_invalidate_tree and it did not cause any visible adverse
effects after running it several days on multiple servers that create
and destroy namespaces on a regular basis.
The extra refcount that is taken and released is on the loopback_dev
of the namespace that the net_device is in. The assumption is that
loopback_dev is always the last net_device to be destroyed in a
namespace. So each net_device in a namespace (except for the
loopback_dev itself) takes such reference when it is unlisted, and it
releases it after it is destroyed (after netdev_wait_allrefs).
The patch also makes sure that any loopback_devs are at the tail of
net_todo_list within every namespace in every instance of
netdev_run_todo. This guarantees that no deadlocks are introduced,
since the relative order of net_devices within each namespace is
preserved, and only non-loopback_devs take and release the extra
reference.

> Your patch modified to grab a count on the network namespace the device
> references and not the device itself might make sense, but that runs the
> risk of incrementing the network namespace counts after the network
> namespace is down.

Right.

>
> Simply not rerunning call_netdevice_notifiers seems like the proper
> approach to fix this.
>

That would be great. There would still be one scenario to take care of though:

- veth interfaces v0 and v1 are in namespaces ns0 and ns1.
- process p0 unregisters v0, which also causes v1 to be unregistered.
When p0 enters netdev_run_todo both v0 and v1 are in net_todo_list and
have been unlisted from their namespaces.
- then in p0's netdev_run_todo:

void netdev_run_todo(void)
{
        struct list_head list;

        /* Snapshot list, allow later requests */
        list_replace_init(&net_todo_list, &list);

        __rtnl_unlock();


        /* Wait for rcu callbacks to finish before next phase */
        if (!list_empty(&list))
                rcu_barrier();

********* Assume ns1 is destroyed by another process here ************

        while (!list_empty(&list)) {
                struct net_device *dev
                        = list_first_entry(&list, struct net_device, todo_list);
                list_del(&dev->todo_list);

                rtnl_lock();
                call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);

********* dst_dev_event/dst_ifdown is invoked here, and it tries to access
            dev_net(dev)->loopback_dev.
            In case of v1 this would be an invalid pointer. *****************

                __rtnl_unlock();

Similar scenarios apply if v1 is a vlan or macvlan interface, and v0
is its real device.

Francesco
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ