[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+HUmGh1AH7afYu+V1nyXgAvtSRR48UJRz0D0NEEppdw3MN1KA@mail.gmail.com>
Date: Mon, 9 Sep 2013 19:03:42 -0700
From: Francesco Ruggeri <fruggeri@...stanetworks.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jiri Pirko <jiri@...nulli.us>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Cong Wang <amwang@...hat.com>, netdev@...r.kernel.org
Subject: Re: [PATCH 1/1] net: race condition when removing virtual net_device
On Mon, Sep 9, 2013 at 5:57 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
>
> On Mon, 9 Sep 2013 16:15:11 -0700
> Francesco Ruggeri <fruggeri@...stanetworks.com> wrote:
>
> > There is a race condition when removing a net_device while its net namespace
> > is also being removed.
> > This can result in a crash or other unpredictable behavior.
> > This is a sample scenario with veth, but the same applies to other virtual
> > devices such as vlan and macvlan.
> > veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
> > Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
> > gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
> > been unlisted from their respective namespaces. If all references to v1 have not
> > already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
> > notifiers for v1, releasing the rtnl lock between calls.
> > Now process p1 removes namespace ns1. v1 will not prevent this from happening
> > (in default_device_exit_batch) since it was already unlisted by p0.
> > Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
> > will get a pointer to a namespace that has been or is being removed.
>
> Namespace's should be ref counted.
Namespaces are refcounted, but rollback_registered_many is also
invoked when a namespace is torn down. That is triggered in put_net
exactly by the namespace refcount dropping to 0, which then triggers
__put_net, cleanup_net, ops_exit_list, default_device_exit_batch and
unregister_netdevice_many. In that case using the namespace refcount
would not prevent the namespace from being deleted, since we already
passed the point (in put_net) where the refcount would have had that
effect. It would probably also not be straightforward to start using
again the namespace refcount in code that started under the premise
that the refcount had dropped to 0.
net_devices in a namespace do not seem to take references to the
namespace that they are in, and as far as I can tell the only thing
that prevents a namespace form being removed from under its
net_devices' feet is default_device_exit or default_device_exit_batch.
But unlist_netdevice creates a window where that logic fails.
In addition to this, in general we can have several instances of
netdev_run_todo running concurrently with any number of net_devices
from any number of net namespaces, possibly cross-referencing each
other. Some of these instances may be in the context of cleanup_net
and some may not.
Taking a refcount on the loopback_dev and making sure the looback_devs
are at the tail of net_todo_list in each of these instances was the
only way I could come up with to address all scenarios, but there may
be better ways to do it.
Thanks,
Francesco
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists