[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161013204405.GA19836@outlook.office365.com>
Date: Thu, 13 Oct 2016 13:44:06 -0700
From: Andrei Vagin <avagin@...tuozzo.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Andrei Vagin <avagin@...nvz.org>, <netdev@...r.kernel.org>,
<containers@...ts.linux-foundation.org>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH] net: limit a number of namespaces which can be cleaned
up concurrently
On Thu, Oct 13, 2016 at 10:49:38AM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@...nvz.org> writes:
>
> > From: Andrey Vagin <avagin@...nvz.org>
> >
> > The operation of destroying netns is heavy and it is executed under
> > net_mutex. If many namespaces are destroyed concurrently, net_mutex can
> > be locked for a long time. It is impossible to create a new netns during
> > this period of time.
>
> This may be the right approach or at least the right approach to bound
> net_mutex hold times but I have to take exception to calling network
> namespace cleanup heavy.
>
> The only particularly time consuming operation I have ever found are calls to
> synchronize_rcu/sycrhonize_sched/synchronize_net.
I booted the kernel with maxcpus=1, in this case these functions work
very fast and the problem is there any way.
Accoding to perf, we spend a lot of time in kobject_uevent:
- 99.96% 0.00% kworker/u4:1 [kernel.kallsyms] [k] unregister_netdevice_many ▒
- unregister_netdevice_many ◆
- 99.95% rollback_registered_many ▒
- 99.64% netdev_unregister_kobject ▒
- 33.43% netdev_queue_update_kobjects ▒
- 33.40% kobject_put ▒
- kobject_release ▒
+ 33.37% kobject_uevent ▒
+ 0.03% kobject_del ▒
+ 0.03% sysfs_remove_group ▒
- 33.13% net_rx_queue_update_kobjects ▒
- kobject_put ▒
- kobject_release ▒
+ 33.11% kobject_uevent ▒
+ 0.01% kobject_del ▒
0.00% rx_queue_release ▒
- 33.08% device_del ▒
+ 32.75% kobject_uevent ▒
+ 0.17% device_remove_attrs ▒
+ 0.07% dpm_sysfs_remove ▒
+ 0.04% device_remove_class_symlinks ▒
+ 0.01% kobject_del ▒
+ 0.01% device_pm_remove ▒
+ 0.01% sysfs_remove_file_ns ▒
+ 0.00% klist_del ▒
+ 0.00% driver_deferred_probe_del ▒
0.00% cleanup_glue_dir.isra.14.part.15 ▒
0.00% to_acpi_device_node ▒
0.00% sysfs_remove_group ▒
0.00% klist_del ▒
0.00% device_remove_attrs ▒
+ 0.26% call_netdevice_notifiers_info ▒
+ 0.04% rtmsg_ifinfo_build_skb ▒
+ 0.01% rtmsg_ifinfo_send ▒
0.00% dev_uc_flush ▒
0.00% netif_reset_xps_queues_gt
Someone can listen these uevents, so we can't stop sending them without
breaking backward compatibility. We can try to optimize kobject_uevent...
>
> Ideally we can search those out calls in the network namespace cleanup
> operations and figuroue out how to eliminate those operations or how to
> stack them.
>
> > In our days when userns allows to create network namespaces to
> > unprivilaged users, it may be a real problem.
>
> Sorting out syncrhonize_rcu calls will be a much larger
> and much more effective improvement than your patch here.
>
> > On my laptop (fedora 24, i5-5200U, 12GB) 1000 namespaces requires about
> > 300MB of RAM and are being destroyed for 8 seconds.
> >
> > In this patch, a number of namespaces which can be cleaned up
> > concurrently is limited by 32. net_mutex is released after handling each
> > portion of net namespaces and then it is locked again to handle the next
> > one. It allows other users to lock it without waiting for a long
> > time.
> >
> > I am not sure whether we need to add a sysctl to costomize this limit.
> > Let me know if you think it's required.
>
> We definitely don't need an extra sysctl.
Thanks,
Andrei
>
> Eric
>
>
> > Cc: "David S. Miller" <davem@...emloft.net>
> > Cc: "Eric W. Biederman" <ebiederm@...ssion.com>
> > Signed-off-by: Andrei Vagin <avagin@...nvz.org>
> > ---
> > net/core/net_namespace.c | 12 +++++++++++-
> > 1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index 989434f..33dd3b7 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -406,10 +406,20 @@ static void cleanup_net(struct work_struct *work)
> > struct net *net, *tmp;
> > struct list_head net_kill_list;
> > LIST_HEAD(net_exit_list);
> > + int i = 0;
> >
> > /* Atomically snapshot the list of namespaces to cleanup */
> > spin_lock_irq(&cleanup_list_lock);
> > - list_replace_init(&cleanup_list, &net_kill_list);
> > + list_for_each_entry_safe(net, tmp, &cleanup_list, cleanup_list)
> > + if (++i == 32)
> > + break;
> > + if (i == 32) {
> > + list_cut_position(&net_kill_list,
> > + &cleanup_list, &net->cleanup_list);
> > + queue_work(netns_wq, work);
> > + } else {
> > + list_replace_init(&cleanup_list, &net_kill_list);
> > + }
> > spin_unlock_irq(&cleanup_list_lock);
> >
> > mutex_lock(&net_mutex);
Powered by blists - more mailing lists