[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1479920345.4035504.797158425.2C10AA0C@webmail.messagingengine.com>
Date:   Wed, 23 Nov 2016 17:59:05 +0100
From:   Hannes Frederic Sowa <hannes@...essinduktion.org>
To:     Jiri Pirko <jiri@...nulli.us>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, idosch@...lanox.com,
        eladr@...lanox.com, yotamg@...lanox.com, nogahf@...lanox.com,
        arkadis@...lanox.com, ogerlitz@...lanox.com,
        roopa@...ulusnetworks.com, dsa@...ulusnetworks.com,
        nikolay@...ulusnetworks.com, andy@...yhouse.net,
        vivien.didelot@...oirfairelinux.com, andrew@...n.ch,
        f.fainelli@...il.com, alexander.h.duyck@...el.com, kaber@...sh.net
Subject: Re: [patch net-next v2 10/11] mlxsw: spectrum_router: Request a dump of
 FIB tables during init
On Wed, Nov 23, 2016, at 17:04, Jiri Pirko wrote:
> Wed, Nov 23, 2016 at 05:00:00PM CET, hannes@...essinduktion.org wrote:
> >On Wed, Nov 23, 2016, at 15:48, Jiri Pirko wrote:
> >> From: Ido Schimmel <idosch@...lanox.com>
> >> 
> >> Make sure the device has a complete view of the FIB tables by invoking
> >> their dump during module init.
> >> 
> >> Signed-off-by: Ido Schimmel <idosch@...lanox.com>
> >> Signed-off-by: Jiri Pirko <jiri@...lanox.com>
> >> ---
> >>  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 16
> >>  ++++++++++++++++
> >>  1 file changed, 16 insertions(+)
> >> 
> >> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >> index 14bed1d..36a71d2 100644
> >> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >> @@ -2027,6 +2027,21 @@ static int mlxsw_sp_router_fib_event(struct
> >> notifier_block *nb,
> >>  	return NOTIFY_DONE;
> >>  }
> >>  
> >> +static void mlxsw_sp_router_fib_dump(struct mlxsw_sp *mlxsw_sp)
> >> +{
> >> +       while (!fib_notifier_dump(&mlxsw_sp->fib_nb)) {
> >> +               /* Flush pending FIB notifications and then flush the
> >> +                * device's table before requesting another dump. Do
> >> +                * that with RTNL held, as FIB notification block is
> >> +                * already registered.
> >> +                */
> >> +               mlxsw_core_flush_owq();
> >> +               rtnl_lock();
> >> +               mlxsw_sp_router_fib_flush(mlxsw_sp);
> >> +               rtnl_unlock();
> >> +       }
> >> +}
> >
> >I think it is fine to use this kind of synchronization.
> >
> >But I think that this part of the logic still belongs into the core
> 
> Core does not know how driver handles the offloaded fibs. So only driver
> knows how/if he needs to do flush in case of retry.
Sure, but an abort function can be provided to the kernel anyway and the
driver can care about that.
> >kernel. I still think it could happen that we will loop here
> >indefinitely because of a lot of routing updates and as such would need
> >to abort this loop after a number of tries.
> 
> In theory, it is possible, howevery quite unlikely.
I think the "quite unlikely" already got us down the path to not using
rtnl_lock in the first place.
As I said, I am not sure about this as I didn't try any hardware
offloading before and delays how long it needs to be transferred to
hardware, but having a fail case for that seems like a nice improvement.
At the same time I know of Linux boxes running in internet exchanges
having several peers. The high update rates actually led to bgp
implementation specifying flap damping which is actually nowadays
considered harmful.
Seriously, while most of the time convergence in routing protocols is
good and most updates only hit the BGP user space table anyway and the
change is suppressed because recursive routing lookup idempotence, quite
unlikely events happen to the internet now and then:
http://research.dyn.com/2009/02/longer-is-not-better/, which caused *a
lot* of flapping and ongoing events on BGP routers throughout the world.
I agree it is unlikely that you have to refresh your hw dump during this
time, but who knows what customers do and what admins do in case
something like this happens. I just don't favor to looping endlessly
trying to sync up and getting into a stable state but tell the admin to
detach the control plane from the forwarding plane and sync up then.
That said, I think a sysctl for a maximum number of loops respected by
drivers that needs to do so, should be enough for the time being.
Bye,
Hannes
Powered by blists - more mailing lists
 
