[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <587CEB67.3040807@cumulusnetworks.com>
Date: Mon, 16 Jan 2017 07:48:55 -0800
From: Roopa Prabhu <roopa@...ulusnetworks.com>
To: David Ahern <dsa@...ulusnetworks.com>
CC: netdev@...r.kernel.org, ddutt@...ulusnetworks.com
Subject: Re: [PATCH net-next v2 1/3] net: ipv6: Allow shorthand delete of
all nexthops in multipath route
On 1/15/17, 12:07 PM, David Ahern wrote:
> IPv4 allows multipath routes to be deleted using just the prefix and
> length. For example:
> $ ip ro ls vrf red
> unreachable default metric 8192
> 1.1.1.0/24
> nexthop via 10.100.1.254 dev eth1 weight 1
> nexthop via 10.11.200.2 dev eth11.200 weight 1
> 10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
> 10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
> $ ip ro del 1.1.1.0/24 vrf red
>
> $ ip ro ls vrf red
> unreachable default metric 8192
> 10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
> 10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
> The same notation does not work with IPv6 because of how multipath routes
> are implemented for IPv6. For IPv6 only the first nexthop of a multipath
> route is deleted if the request contains only a prefix and length. This
> leads to unnecessary complexity in userspace dealing with IPv6 multipath
> routes.
>
> This patch allows all nexthops to be deleted without specifying each one
> in the delete request by passing a new flag, RTM_F_ALL_NEXTHOPS, in
> rtm_flags. Internally, this is done by walking the sibling list of the
> route matching the specifications given (prefix, length, metric, protocol,
> etc).
>
> With this patch (and an updated iproute2 command):
> $ ip -6 ro ls vrf red
> 2001:db8::/120 via 2001:db8:1::62 dev eth1 metric 256 pref medium
> 2001:db8::/120 via 2001:db8:1::61 dev eth1 metric 256 pref medium
> 2001:db8::/120 via 2001:db8:1::60 dev eth1 metric 256 pref medium
> 2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
> ...
>
> $ ip -6 ro del vrf red 1111::1/120
> $ ip -6 ro ls vrf red
> 2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
> ...
>
> The flag is added to fib6_config by converting fc_type to a u16 (as
> noted fc_type only uses 8 bits). The new u16 hole is a bitmap with
> fc_delete_all_nexthop as the first bit.
>
> Suggested-by: Dinesh Dutt <ddutt@...ulusnetworks.com>
> Signed-off-by: David Ahern <dsa@...ulusnetworks.com>
> ---
> v2
> - switched example to rfc 3849 documentation address per request
> - changed delete loop to explicitly look at siblings list for
> first route matching specs given (metric, protocol, etc)
>
> include/net/ip6_fib.h | 4 +++-
> include/uapi/linux/rtnetlink.h | 1 +
> net/ipv6/route.c | 28 +++++++++++++++++++++++++---
> 3 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> index a74e2aa40ef4..11ab99e87c5f 100644
> --- a/include/net/ip6_fib.h
> +++ b/include/net/ip6_fib.h
> @@ -37,7 +37,9 @@ struct fib6_config {
> int fc_ifindex;
> u32 fc_flags;
> u32 fc_protocol;
> - u32 fc_type; /* only 8 bits are used */
> + u16 fc_type; /* only 8 bits are used */
> + u16 fc_delete_all_nexthop : 1,
> + __unused : 15;
>
> struct in6_addr fc_dst;
> struct in6_addr fc_src;
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 8c93ad1ef9ab..7fb206bc42f9 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -276,6 +276,7 @@ enum rt_scope_t {
> #define RTM_F_EQUALIZE 0x400 /* Multipath equalizer: NI */
> #define RTM_F_PREFIX 0x800 /* Prefix addresses */
> #define RTM_F_LOOKUP_TABLE 0x1000 /* set rtm_table to FIB lookup result */
> +#define RTM_F_ALL_NEXTHOPS 0x2000 /* delete all nexthops (IPv6) */
>
Do we really need the flag ?. It seems like delete with just prefix should delete all the routes in a multipath
route by default... (understand that you have it there to preserve existing behavior...for people who maybe relying on it. But this seems more like a bug fix. route replace went through a few such bug fixes "ipv6: fix ECMP route replacement"). ok with either approach.
Powered by blists - more mailing lists