lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220207171756.1304544-1-eric.dumazet@gmail.com>
Date:   Mon,  7 Feb 2022 09:17:45 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     netdev <netdev@...r.kernel.org>, David Ahern <dsahern@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: [PATCH net-next 00/11] net: speedup netns dismantles

From: Eric Dumazet <edumazet@...gle.com>

In this series, I made network namespace deletions more scalable,
by 4x using the benchmark described in this cover letter.

- Remove bottleneck on ipv6 addrconf, by replacing a global
  hash table to a per netns one.

- rtnl mutex can be heavily contended, and is a blocker for
  the specialized kernel thread responsible for cleaning and
  freeing network namespaces.
  Rework eight (struct pernet_operations *)->exit() handlers to
  exit_batch() ones. This removes many rtnl acquisitions,
  and gives to cleanup_net() kind of a priority over rtnl
  ownership, when processing batches.

Tested on a host with 24 cpus (48 HT)

Test script:

for nr in {1..10}
do
  (for i in {1..10000}; do unshare -n /bin/bash -c "ifconfig lo up"; done) &
done
wait

for i in {1..10}
do
  sleep 1 
  echo 3 >/proc/sys/vm/drop_caches
  grep net_namespace /proc/slabinfo
done

Before this series:
We can see host struggles to clean the netns, even after there are
no new creations. Memory cost is high, because each netns consumes
a good amount of memory.

time ./unshare10.sh
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      82634  82634   3968    1    1 : tunables   24   12    8 : slabdata  82634  82634      0
net_namespace      37214  37792   3968    1    1 : tunables   24   12    8 : slabdata  37214  37792    192

real	6m57.766s
user	3m37.277s
sys	40m4.826s

After this series:
We can see the script completes much faster, the kernel thread
doing the cleanup_net() keeps up just fine.
Memory cost is not too big.

time ./unshare10.sh
net_namespace       9945   9945   4096    1    1 : tunables   24   12    8 : slabdata   9945   9945      0
net_namespace       4087   4665   4096    1    1 : tunables   24   12    8 : slabdata   4087   4665    192
net_namespace       4082   4607   4096    1    1 : tunables   24   12    8 : slabdata   4082   4607    192
net_namespace        234    761   4096    1    1 : tunables   24   12    8 : slabdata    234    761    192
net_namespace        224    751   4096    1    1 : tunables   24   12    8 : slabdata    224    751    192
net_namespace        218    745   4096    1    1 : tunables   24   12    8 : slabdata    218    745    192
net_namespace        193    667   4096    1    1 : tunables   24   12    8 : slabdata    193    667    172
net_namespace        167    609   4096    1    1 : tunables   24   12    8 : slabdata    167    609    152
net_namespace        167    609   4096    1    1 : tunables   24   12    8 : slabdata    167    609    152
net_namespace        157    609   4096    1    1 : tunables   24   12    8 : slabdata    157    609    152

real    1m43.876s
user    3m39.728s
sys 7m36.342s


Eric Dumazet (11):
  ipv6/addrconf: allocate a per netns hash table
  ipv6/addrconf: use one delayed work per netns
  ipv6/addrconf: switch to per netns inet6_addr_lst hash table
  nexthop: change nexthop_net_exit() to nexthop_net_exit_batch()
  ipv4: add fib_net_exit_batch()
  ipv6: change fib6_rules_net_exit() to batch mode
  ip6mr: introduce ip6mr_net_exit_batch()
  ipmr: introduce ipmr_net_exit_batch()
  can: gw: switch cangw_pernet_exit() to batch mode
  bonding: switch bond_net_exit() to batch mode
  net: remove default_device_exit()

 drivers/net/bonding/bond_main.c   |  27 ++++--
 drivers/net/bonding/bond_procfs.c |   1 -
 include/net/netns/ipv6.h          |   5 ++
 net/can/gw.c                      |   9 +-
 net/core/dev.c                    |  22 +++--
 net/ipv4/fib_frontend.c           |  19 +++-
 net/ipv4/ipmr.c                   |  20 +++--
 net/ipv4/nexthop.c                |  12 ++-
 net/ipv6/addrconf.c               | 139 ++++++++++++++----------------
 net/ipv6/fib6_rules.c             |  11 ++-
 net/ipv6/ip6mr.c                  |  20 +++--
 11 files changed, 172 insertions(+), 113 deletions(-)

-- 
2.35.0.263.gb82422642f-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ