netdev - Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171114221531.GA8783@outlook.office365.com>
Date:   Tue, 14 Nov 2017 14:15:32 -0800
From:   Andrei Vagin <avagin@...tuozzo.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Kirill Tkhai <ktkhai@...tuozzo.com>, davem@...emloft.net,
        vyasevic@...hat.com, kstewart@...uxfoundation.org,
        pombredanne@...b.com, vyasevich@...il.com, mark.rutland@....com,
        gregkh@...uxfoundation.org, adobriyan@...il.com, fw@...len.de,
        nicolas.dichtel@...nd.com, xiyou.wangcong@...il.com,
        roman.kapl@...go.com, paul@...l-moore.com, dsahern@...il.com,
        daniel@...earbox.net, lucien.xin@...il.com,
        mschiffer@...verse-factory.net, rshearma@...cade.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        ebiederm@...ssion.com, gorcunov@...tuozzo.com
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read
 it on net->init/->exit

On Tue, Nov 14, 2017 at 10:00:59AM -0800, Eric Dumazet wrote:
> On Tue, 2017-11-14 at 09:44 -0800, Andrei Vagin wrote:
> > On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote:
> > > Curently mutex is used to protect pernet operations list. It makes
> > > cleanup_net() to execute ->exit methods of the same operations set,
> > > which was used on the time of ->init, even after net namespace is
> > > unlinked from net_namespace_list.
> > > 
> > > But the problem is it's need to synchronize_rcu() after net is removed
> > > from net_namespace_list():
> > > 
> > > Destroy net_ns:
> > > cleanup_net()
> > >   mutex_lock(&net_mutex)
> > >   list_del_rcu(&net->list)
> > >   synchronize_rcu()                                  <--- Sleep there for ages
> > >   list_for_each_entry_reverse(ops, &pernet_list, list)
> > >     ops_exit_list(ops, &net_exit_list)
> > >   list_for_each_entry_reverse(ops, &pernet_list, list)
> > >     ops_free_list(ops, &net_exit_list)
> > >   mutex_unlock(&net_mutex)
> > > 
> > > This primitive is not fast, especially on the systems with many processors
> > > and/or when preemptible RCU is enabled in config. So, all the time, while
> > > cleanup_net() is waiting for RCU grace period, creation of new net namespaces
> > > is not possible, the tasks, who makes it, are sleeping on the same mutex:
> > > 
> > > Create net_ns:
> > > copy_net_ns()
> > >   mutex_lock_killable(&net_mutex)                    <--- Sleep there for ages
> > > 
> > > The solution is to convert net_mutex to the rw_semaphore. Then,
> > > pernet_operations::init/::exit methods, modifying the net-related data,
> > > will require down_read() locking only, while down_write() will be used
> > > for changing pernet_list.
> > > 
> > > This gives signify performance increase, like you may see below. There
> > > is measured sequential net namespace creation in a cycle, in single
> > > thread, without other tasks (single user mode):
> > > 
> > > 1)int main(int argc, char *argv[])
> > > {
> > >         unsigned nr;
> > >         if (argc < 2) {
> > >                 fprintf(stderr, "Provide nr iterations arg\n");
> > >                 return 1;
> > >         }
> > >         nr = atoi(argv[1]);
> > >         while (nr-- > 0) {
> > >                 if (unshare(CLONE_NEWNET)) {
> > >                         perror("Can't unshare");
> > >                         return 1;
> > >                 }
> > >         }
> > >         return 0;
> > > }
> > > 
> > > Origin, 100000 unshare():
> > > 0.03user 23.14system 1:39.85elapsed 23%CPU
> > > 
> > > Patched, 100000 unshare():
> > > 0.03user 67.49system 1:08.34elapsed 98%CPU
> > > 
> > > 2)for i in {1..10000}; do unshare -n bash -c exit; done
> > 
> > Hi Kirill,
> > 
> > This mutex has another role. You know that net namespaces are destroyed
> > asynchronously, and the net mutex gurantees that a backlog will be not
> > big. If we have something in backlog, we know that it will be handled
> > before creating a new net ns.
> > 
> > As far as I remember net namespaces are created much faster than
> > they are destroyed, so with this changes we can create a really big
> > backlog, can't we?
> 
> Please take a look at the recent patches I did :
> 
> 8ca712c373a462cfa1b62272870b6c2c74aa83f9 Merge branch 'net-speedup-netns-create-delete-time'
> 64bc17811b72758753e2b64cd8f2a63812c61fe1 ipv4: speedup ipv6 tunnels dismantle
> bb401caefe9d2c65e0c0fa23b21deecfbfa473fe ipv6: speedup ipv6 tunnels dismantle
> 789e6ddb0b2fb5d5024b760b178a47876e4de7a6 tcp: batch tcp_net_metrics_exit
> a90c9347e90ed1e9323d71402ed18023bc910cd8 ipv6: addrlabel: per netns list
> d464e84eed02993d40ad55fdc19f4523e4deee5b kobject: factorize skb setup in kobject_uevent_net_broadcast()
> 4a336a23d619e96aef37d4d054cfadcdd1b581ba kobject: copy env blob in one go
> 16dff336b33d87c15d9cbe933cfd275aae2a8251 kobject: add kobject_uevent_net_broadcast()
> 

Good job! Now it really works much faster. I tested these patches with
Kirill's one and everithing works good. I could not reproduce a
situation, when a backlog starts growing.

Thanks Kirill and Eric.