[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180103174025.GA6584@splinter>
Date: Wed, 3 Jan 2018 19:40:25 +0200
From: Ido Schimmel <idosch@...sch.org>
To: David Ahern <dsahern@...il.com>
Cc: Ido Schimmel <idosch@...lanox.com>, netdev@...r.kernel.org,
davem@...emloft.net, roopa@...ulusnetworks.com,
nicolas.dichtel@...nd.com, mlxsw@...lanox.com
Subject: Re: [RFC PATCH net-next 03/19] ipv6: Clear nexthop flags upon netdev
up
On Wed, Jan 03, 2018 at 09:56:02AM -0700, David Ahern wrote:
> On 1/3/18 9:43 AM, Ido Schimmel wrote:
> > On Wed, Jan 03, 2018 at 08:32:51AM -0700, David Ahern wrote:
> >> On 1/3/18 12:44 AM, Ido Schimmel wrote:
> >>>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >>>>> index ed06b1190f05..b6405568ed7b 100644
> >>>>> --- a/net/ipv6/addrconf.c
> >>>>> +++ b/net/ipv6/addrconf.c
> >>>>> @@ -3484,6 +3484,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
> >>>>> if (run_pending)
> >>>>> addrconf_dad_run(idev);
> >>>>>
> >>>>> + /* Device has an address by now */
> >>>>> + rt6_sync_up(dev, RTNH_F_DEAD);
> >>>>> +
> >>>>
> >>>> Seems like this should be in the NETDEV_UP section, say after
> >>>> addrconf_permanent_addr.
> >>>
> >>> Unless the `keep_addr_on_down` sysctl is set, then at this stage the
> >>> netdev doesn't have an IP address and we shouldn't clear the dead flag
> >>> just yet.
> >>>
> >>> This is consistent with IPv4 that clears the dead flag from nexthops in
> >>> a multipath route only if the nexthop device has an IP address. When the
> >>> last IPv4 address is removed from a netdev all the routes using it are
> >>> flushed and there's nothing to clear upon NETDEV_UP.
> >>
> >> I have a bug about that IPv4 handling from the FRR team:
> >>
> >> $ ip link add dummy1 type dummy
> >> $ ip li set dummy1 up
> >> $ ip route add 1.1.1.0/24 dev dummy1
> >>
> >> $ ip addr add dev dummy1 2.2.2.1/24
> >> $ ip ro ls | grep dummy1
> >> 1.1.1.0/24 dev dummy1 scope link
> >> 2.2.2.0/24 dev dummy1 proto kernel scope link src 2.2.2.1
> >>
> >> $ ip addr del dev dummy1 2.2.2.1/24
> >> $ ip ro ls | grep dummy1
> >> <no outpu>
> >>
> >> The 1.1.1.0/24 route was removed as well the 2.2.2.0 connected route.
> >
> > If you're going to skip the flushing in this case, at least mark the
> > nexthops as dead.
>
> On a down event, yes. If the device is still up then a route such as:
> $ ip route add 1.1.1.0/24 dev dummy1
> should still be usable even without an address on it.
mlxsw will trap all the packets hitting the route until you assign an IP
address to dummy1.
> > And this is my second reason to have rt6_sync_up() where I put it. I'm
> > preparing another set which sends FIB_EVENT_NH_ADD events from
> > rt6_sync_up() similar to what we've in fib_sync_up(). When mlxsw (others
>
> On a tangent here, but I have been meaning to ask why you have
> FIB_EVENT_NH_ADD events as opposed to handling netdev events. What does
> a FIB_EVENT_NH_ADD provide that you can't do from a netdev event handler?
It'll make switch drivers more complex than they already are. Why every
driver needs to duplicate the logic in call_fib_nh_notifiers()?
> > in the future) processes the event it needs to add the nexthop back to
> > the forwarding plane. To do that, it needs to have a RIF for the
> > nexthop device. For the nexthop device to have a RIF, it needs at least
> > one IP address configured on the netdev.
>
> Why is that?
> $ ip addr sh dev swp1s0.51
> 44: swp1s0.51@...1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue master vrf1101 state UP group default qlen 1000
> link/ether 7c:fe:90:e8:3a:7d brd ff:ff:ff:ff:ff:ff
>
> $ ip ro add vrf vrf1101 1.1.1.0/24 dev swp1s0.51
>
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
> 1.1.1.0/24 dev swp1s0.51 scope link offload
>
> In this case, I take it mlxsw allocates a rif because of the vlan. The
> above does not work on just swp1s0 -- ie., that route is not offloaded:
>
> $ # ip ro ls
> ...
> 1.1.1.0/24 dev swp1s0 scope link
> ...
>
> Interesting.
It allocates the RIF because of the enslavement to a VRF, which is an
explicit indication the user wants to use the interface for L3
forwarding.
David, can we please get back to the issue at hand? What's the problem
with the location of the call to rt6_sync_up()?
Powered by blists - more mailing lists