lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171114174454.GA11452@outlook.office365.com>
Date:   Tue, 14 Nov 2017 09:44:55 -0800
From:   Andrei Vagin <avagin@...tuozzo.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     davem@...emloft.net, vyasevic@...hat.com,
        kstewart@...uxfoundation.org, pombredanne@...b.com,
        vyasevich@...il.com, mark.rutland@....com,
        gregkh@...uxfoundation.org, adobriyan@...il.com, fw@...len.de,
        nicolas.dichtel@...nd.com, xiyou.wangcong@...il.com,
        roman.kapl@...go.com, paul@...l-moore.com, dsahern@...il.com,
        daniel@...earbox.net, lucien.xin@...il.com,
        mschiffer@...verse-factory.net, rshearma@...cade.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        ebiederm@...ssion.com, gorcunov@...tuozzo.com
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read
 it on net->init/->exit

On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote:
> Curently mutex is used to protect pernet operations list. It makes
> cleanup_net() to execute ->exit methods of the same operations set,
> which was used on the time of ->init, even after net namespace is
> unlinked from net_namespace_list.
> 
> But the problem is it's need to synchronize_rcu() after net is removed
> from net_namespace_list():
> 
> Destroy net_ns:
> cleanup_net()
>   mutex_lock(&net_mutex)
>   list_del_rcu(&net->list)
>   synchronize_rcu()                                  <--- Sleep there for ages
>   list_for_each_entry_reverse(ops, &pernet_list, list)
>     ops_exit_list(ops, &net_exit_list)
>   list_for_each_entry_reverse(ops, &pernet_list, list)
>     ops_free_list(ops, &net_exit_list)
>   mutex_unlock(&net_mutex)
> 
> This primitive is not fast, especially on the systems with many processors
> and/or when preemptible RCU is enabled in config. So, all the time, while
> cleanup_net() is waiting for RCU grace period, creation of new net namespaces
> is not possible, the tasks, who makes it, are sleeping on the same mutex:
> 
> Create net_ns:
> copy_net_ns()
>   mutex_lock_killable(&net_mutex)                    <--- Sleep there for ages
> 
> The solution is to convert net_mutex to the rw_semaphore. Then,
> pernet_operations::init/::exit methods, modifying the net-related data,
> will require down_read() locking only, while down_write() will be used
> for changing pernet_list.
> 
> This gives signify performance increase, like you may see below. There
> is measured sequential net namespace creation in a cycle, in single
> thread, without other tasks (single user mode):
> 
> 1)int main(int argc, char *argv[])
> {
>         unsigned nr;
>         if (argc < 2) {
>                 fprintf(stderr, "Provide nr iterations arg\n");
>                 return 1;
>         }
>         nr = atoi(argv[1]);
>         while (nr-- > 0) {
>                 if (unshare(CLONE_NEWNET)) {
>                         perror("Can't unshare");
>                         return 1;
>                 }
>         }
>         return 0;
> }
> 
> Origin, 100000 unshare():
> 0.03user 23.14system 1:39.85elapsed 23%CPU
> 
> Patched, 100000 unshare():
> 0.03user 67.49system 1:08.34elapsed 98%CPU
> 
> 2)for i in {1..10000}; do unshare -n bash -c exit; done

Hi Kirill,

This mutex has another role. You know that net namespaces are destroyed
asynchronously, and the net mutex gurantees that a backlog will be not
big. If we have something in backlog, we know that it will be handled
before creating a new net ns.

As far as I remember net namespaces are created much faster than
they are destroyed, so with this changes we can create a really big
backlog, can't we?

There was a discussion a few month ago:
https://lists.onap.org/pipermail/containers/2016-October/037509.html


> 
> Origin:
> real 1m24,190s
> user 0m6,225s
> sys 0m15,132s

Here you measure time of creating and destroying net namespaces.

> 
> Patched:
> real 0m18,235s   (4.6 times faster)
> user 0m4,544s
> sys 0m13,796s

But here you measure time of crearing namespaces and you know nothing
when they will be destroyed.

Thanks,
Andrei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ