[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1473163300-2045-1-git-send-email-jiri@resnulli.us>
Date: Tue, 6 Sep 2016 14:01:38 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net, idosch@...lanox.com, eladr@...lanox.com,
yotamg@...lanox.com, nogahf@...lanox.com, ogerlitz@...lanox.com,
roopa@...ulusnetworks.com, nikolay@...ulusnetworks.com,
linville@...driver.com, tgraf@...g.ch, gospo@...ulusnetworks.com,
sfeldma@...il.com, ast@...mgrid.com, edumazet@...gle.com,
hannes@...essinduktion.org, f.fainelli@...il.com,
dsa@...ulusnetworks.com, jhs@...atatu.com,
vivien.didelot@...oirfairelinux.com, john.fastabend@...el.com,
andrew@...n.ch, ivecera@...hat.com
Subject: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
From: Jiri Pirko <jiri@...lanox.com>
This is RFC, unfinished. I came across some issues in the process so I would
like to share those and restart the fib offload discussion in order to make it
really usable.
So the goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.
Currently, only fibs related to the switch port netdev are offloaded using
switchdev ops. This model is not extendable so the first patch introduces
a replacement: notifier to propagate fib additions and removals to whoever
interested. The second patch makes mlxsw to adopt this new way, registering
one notifier block for each mlxsw (asic) instance.
Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during fib add offload. This leads to removal of all offloaded fibs on
system by fib_trie code.
Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload fib does not mean that the others can't
do it. So let only one entity to abort and leave the rest to work happily.
2) The driver knows what to in order to properly abort. For example, currently
abort is broken for mlxsw as for Spectrum there is a need to set 0.0.0.0/0
trap in RALUE register.
Issues:
1) RTNH_F_OFFLOAD is originally set in switchdev core. There the assumption is
that only one offload device exists. But for fib notifier, we assume
multiple offload devices. When should the offload flag be set and by who?
I think that it would make sense to have a per-fib reference counter
for this:
0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
n means RTNH_F_OFFLOAD is set and the fib entry is offloaded by n devices
2) Unabort? Would be nice. Currently when add_failure->abort happens,
user's only option is to reboot the machine. I would like to make this
nicer for the fib notifier implementation. Perhaps to provide some button in
devlink which would tell driver to try to offload entries again? Not sure.
3) Policies. Not directly connected to this patchset but this issues
we have been discussing couple of times and I still believe that
the current state is not good.
Software-only forwarding now happens in case of abort and makes the ASIC
ports to act like dummy separate NICs. In case of Spectrum, the bandwidth
of CPU port is something around 4Gbit. For 32x100Gbit ports this is
simply not possible to handle. In case of abort, the system is broken
as it really could not forward packets at a speed not even close
to the expected.
Here the policies come to the picture, allowing the user to set the
system to behave according his expectations. For example rather
fail to add the route than to abort to software forward.
This policy could be per-ASIC, configurable by devlink.
Thoughts please?
Jiri Pirko (2):
fib: introduce fib notification infrastructure
mlxsw: spectrum_router: Use FIB notifications instead of switchdev
calls
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 8 +-
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 257 ++++++++++-----------
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 9 -
include/net/ip_fib.h | 19 ++
net/ipv4/fib_trie.c | 43 ++++
5 files changed, 181 insertions(+), 155 deletions(-)
--
2.5.5
Powered by blists - more mailing lists