[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180416152255.2256-1-dsahern@gmail.com>
Date: Mon, 16 Apr 2018 08:22:34 -0700
From: David Ahern <dsahern@...il.com>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net, idosch@...sch.org, roopa@...ulusnetworks.com,
eric.dumazet@...il.com, weiwan@...gle.com, kafai@...com,
yoshfuji@...ux-ipv6.org, David Ahern <dsahern@...il.com>
Subject: [PATCH net-next 00/21] net/ipv6: Separate data structures for FIB and data path
IPv6 uses the same data struct for both control plane (FIB entries) and
data path (dst entries). This struct has elements needed for both paths
adding memory overhead and complexity (taking a dst hold in most places
but an additional reference on rt6i_ref in a few). Furthermore, because
of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.
This patch set separates FIB entries from dst entries, better aligning
IPv6 code with IPv4, simplifying the reference counting and allowing
FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
first step to a number of performance and scalability changes.
The end result of this patch set:
- FIB entries (fib6_info):
/* size: 208, cachelines: 4, members: 25 */
/* sum members: 207, holes: 1, sum holes: 1 */
- dst entries (rt6_info)
/* size: 240, cachelines: 4, members: 11 */
Versus the the single rt6_info struct today for both paths:
/* size: 320, cachelines: 5, members: 28 */
This amounts to a 35% reduction in memory use for FIB entries and a
25% reduction for dst entries.
With respect to locking FIB entries use RCU and a single atomic
counter with fib6_info_hold and fib6_info_release helpers to manage
the reference counting. dst entries use only the traditional dst
refcounts with dst_hold and dst_release.
FIB entries for host routes are referenced by inet6_ifaddr and
ifacaddr6. In both cases, additional holds are taken -- similar to
what is done for devices.
This set is the first of many changes to improve the scalability of the
IPv6 code. Follow on changes include:
- consolidating duplicate fib6_info references like IPv4 does with
duplicate fib_info
- moving fib6_info into a slab cache to avoid allocation roundups to
power of 2 (the 208 size becomes a 256 actual allocation)
- Allow FIB lookups without generating a dst (e.g., most rt6_lookup
users just want to verify the egress device). Means moving dst
allocation to the other side of fib6_rule_lookup which again aligns
with IPv4 behavior
- using separate standalone nexthop objects which have performance
benefits beyond fib_info consolidation
At this point I am not seeing any refcount leaks or underflows, no
oops or bug_ons, or warnings from kasan, so I think it is ready for
others to beat up on it finding errors in code paths I have missed.
v1 changes
- rebased to top of tree
- fix memory leak of metrics as noted by Ido
- MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)
RFC v2 changes
- improved commit messages
- move common metrics code from dst.c to net/ipv4/metrics.c (comment
from DaveM)
- address comments from Wei Wang and Martin KaFai Lau (let me know if
I missed something)
- fixes detected by kernel test robots
+ added fib6_metric_set to change metric on a FIB entry which could
be pointing to read-only dst_default_metrics
+ 0day testing found a problem with an intermediate patch; added
dst_hold_safe on rt->from. Code is removed 3 patches later
- allow cacheinfo to handle NULL dst; means only expires is pushed to
userspace
David Ahern (21):
net: Move fib_convert_metrics to metrics file
net: Handle null dst in rtnl_put_cacheinfo
vrf: Move fib6_table into net_vrf
net/ipv6: Pass net to fib6_update_sernum
net/ipv6: Pass net namespace to route functions
net/ipv6: Move support functions up in route.c
net/ipv6: Save route type in rt6_info
net/ipv6: Move nexthop data to fib6_nh
net/ipv6: Defer initialization of dst to data path
net/ipv6: move metrics from dst to rt6_info
net/ipv6: move expires into rt6_info
net/ipv6: Add fib6_null_entry
net/ipv6: Add rt6_info create function for ip6_pol_route_lookup
net/ipv6: Move dst flags to booleans in fib entries
net/ipv6: Create a neigh_lookup for FIB entries
net/ipv6: Add gfp_flags to route add functions
net/ipv6: Cleanup exception and cache route handling
net/ipv6: introduce fib6_info struct and helpers
net/ipv6: separate handling of FIB entries from dst based routes
net/ipv6: Flip FIB entries to fib6_info
net/ipv6: Remove unused code and variables for rt6_info
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 96 +-
drivers/net/vrf.c | 25 +-
include/net/if_inet6.h | 4 +-
include/net/ip.h | 3 +
include/net/ip6_fib.h | 151 ++-
include/net/ip6_route.h | 45 +-
include/net/netns/ipv6.h | 3 +-
net/core/dst.c | 1 +
net/core/rtnetlink.c | 8 +-
net/ipv4/Makefile | 3 +-
net/ipv4/fib_semantics.c | 43 +-
net/ipv4/metrics.c | 53 +
net/ipv6/addrconf.c | 118 +-
net/ipv6/anycast.c | 21 +-
net/ipv6/ip6_fib.c | 366 +++---
net/ipv6/ip6_output.c | 3 +-
net/ipv6/ndisc.c | 40 +-
net/ipv6/route.c | 1369 ++++++++++----------
net/ipv6/xfrm6_policy.c | 2 -
19 files changed, 1218 insertions(+), 1136 deletions(-)
create mode 100644 net/ipv4/metrics.c
--
2.11.0
Powered by blists - more mailing lists