lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 07 May 2015 13:26:44 -0500 From: ebiederm@...ssion.com (Eric W. Biederman) To: Cong Wang <cwang@...pensource.com> Cc: Ying Xue <ying.xue@...driver.com>, netdev <netdev@...r.kernel.org>, Herbert Xu <herbert@...dor.apana.org.au>, xemul@...nvz.org, David Miller <davem@...emloft.net>, Eric Dumazet <eric.dumazet@...il.com>, maxk@....qualcomm.com, Stephen Hemminger <stephen@...workplumber.org>, Thomas Graf <tgraf@...g.ch>, Nicolas Dichtel <nicolas.dichtel@...nd.com>, Tom Herbert <tom@...bertland.com>, James Chapman <jchapman@...alix.com>, Erik Hugne <erik.hugne@...csson.com>, jon.maloy@...csson.com, horms@...ge.net.au Subject: Re: [RFC PATCH net-next 00/11] netns: don't switch namespace while creating kernel sockets Cong Wang <cwang@...pensource.com> writes: > On Thu, May 7, 2015 at 9:14 AM, Eric W. Biederman <ebiederm@...ssion.com> wrote: >> Ying Xue <ying.xue@...driver.com> writes: >> >>> When commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and >>> netlink_kernel_create().") attempted to fix the race between put_net() >>> and kernel socket's creation, it adopted a complex solution: create >>> netlink socket inside init_net namespace and then re-attach it to the >>> desired one right after the socket is created; similarly, when close >>> the socket, move back its namespace to init_net so that the socket can >>> be destroyed in the context which is same as the socket creation. >>> >>> But the solution artificially makes the whole thing complex as its >>> design is not only weird, but also it causes a bad consequence that >>> when all kernel modules create kernel sockets, they have to follow >>> the model of namespace switch. More importantly, with the way kernel >>> sockets are created in init_net namespace, but they are released in >>> another new ones. This inconsistent namespace brings some modules many >>> inconvenience. For example, what tipc socket is inserted to rhashtable >>> happens in socket's creation, and different namespace has different >>> rhashtable for tipc socket. With the approach, a tipc kernel socket >>> will be inserted into the rhashtable of init_net. But as releasing >>> the socket happens in another one, it causes what the socket cannot >>> be found from the rhashtable of the new namespace. >>> >>> Therefore, we propose a simpler solution to avoid the race: if we >>> find there is still pending a cleanup work in __put_net(), we don't >>> queue a new cleanup work to stop the cleanup process. The new proposal >>> not only successfully solves the race, but also it can help us to >>> avoid unnecessary namespace switches when creating kernel sockets. >>> Moreover, it can guarantee that both creation and release of kernel >>> sockets happen in the same namespace at all time. >>> >>> In the series, we first resolve the race with patch #1, and then >>> prevent namespace switches from happening in all relevant kernel >>> modules one by one from patch #2 to patch #9. Until now, as all >>> dependencies on sk_change_net() are killed, we can delete the >>> interface completely in patch #10. Lastly, we simplify the code of >>> creating kernel sockets through changing the original behaviours >>> of sock_create_kern() and sk_release_kernel(). If a kernel socket >>> is created within a namespace which is different with init_net, >>> we must put the reference counter of the namespace once the socket >>> is successfully allocated in sk_alloc(), otherwise, the namespace >>> is probably unable to be shut down finally. Therefore, we decrease >>> namespace's reference counter once a kernel socket is created >>> successfully by sock_create_kern() within a namespace which is >>> different with init_net. Similarly, namespace's reference counter >>> must be increased back before the socket is destroyed in >>> sk_release_kernel(). >>> >>> Welcome to any comments. >> >> I agree that commit 23fe18669e7f ("[NETNS]: Fix race between put_net() >> and netlink_kernel_create()." was a hack. >> >> However it is not appropriate to call get_net on a network namespace >> whose count might be zero. I believe all of your patches rely on that >> currently. Instead we need to build something like sk_release_kernel >> that does not increase the network namespace reference count if you are >> going to avoid changing the network namespace on a socket (a worthy >> goal). >> >> The following change shows how it is possible to always know that your >> network namespace has a non-zero reference count in the network >> namespace initialization methods. My implementation of >> lock_network_namespaces is problematic in that it does not sleep >> while network namespaces are unregistering. But it is enough to show >> how the locking and reference counting can be fixed. > > Why does this have to be so complicated? We can simply avoid > calling ops_init() by skipping those in cleanup_list, no? The problem is that there is a single list of methods to call and if you simply skip calling the initialization methods for a struct net and add yourself to the list cleanup_net will then call the cleanup methods without calling the cleanup methods. Simply limiting new network namespace registrations to a point when network namespaces are not being registered or unregisted seems like the simplest way to achieve this effect. > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 78fc04a..c7cbd5a 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -242,6 +242,7 @@ static __net_init int setup_net(struct net *net, > struct user_namespace *user_ns) > net->dev_base_seq = 1; > net->user_ns = user_ns; > idr_init(&net->netns_ids); > + INIT_LIST_HEAD(&net->cleanup_list); > > list_for_each_entry(ops, &pernet_list, list) { > error = ops_init(ops, net); > @@ -737,6 +738,8 @@ static int __register_pernet_operations(struct > list_head *list, > list_add_tail(&ops->list, list); > if (ops->init || (ops->id && ops->size)) { > for_each_net(net) { > + if (!list_empty(net->cleanup_list)) // <--- > Need a big comment here; > + continue; > error = ops_init(ops, net); > if (error) > goto out_undo; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists