lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Apr 2017 19:50:55 +0200
From:   Andrey Konovalov <andreyknvl@...gle.com>
To:     Eric Dumazet <edumazet@...gle.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        Kostya Serebryany <kcc@...gle.com>,
        syzkaller <syzkaller@...glegroups.com>
Subject: net: cleanup_net is slow

Hi!

We're investigating some approaches to improve isolation of syzkaller
programs. One of the ideas is run each program in it's own user/net
namespace. However, while I was experimenting with this, I stumbled
upon a problem.

It seems that cleanup_net() might take a very long time to execute.

I've attached the reproducer and kernel .config that I used. Run as
"./a.out 1". The reproducer just forks and does unshare(CLONE_NEWNET)
in a loop. Note, that I have a lot of network-related configs enabled,
which causes a few interfaces to be set up by default.

What I see with this reproducer is that at first a huge number
(~200-300) net namespaces are created without any contention. But then
(probably when one of these namespaces gets destroyed) the program
hangs for a considerable amount of time (~100 seconds in my vm).
Nothing locks up inside the kernel and the CPU is mostly idle.

Adding debug printfs showed that the part that takes almost all of
that time is the lines between synchronize_rcu() and
mutex_unlock(&net_mutex) in cleanup_net. Running perf showed that the
cause of this might be a lot of calls to synchronize_net that happen
while executing those lines.

Is there any change that can be done to speed up the
creation/destruction of a huge number of net namespaces?

Running the reproducer with unshare(CLONE_NEWUSER) doesn't seem to
cause any delays.

Thanks!

Download attachment ".config" of type "application/octet-stream" (127147 bytes)

View attachment "unshare-newnet-slow-poc.c" of type "text/x-csrc" (1259 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ