[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab507096-3757-431c-b040-1353e762f9bd@blackwall.org>
Date: Sat, 8 Jun 2024 13:39:09 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Petr Machata <petrm@...dia.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: Ido Schimmel <idosch@...dia.com>, David Ahern <dsahern@...nel.org>,
mlxsw@...dia.com
Subject: Re: [PATCH net-next v2 0/5] Allow configuration of multipath hash
seed
On 6/7/24 18:13, Petr Machata wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
>
> When calculating hashes for the purpose of multipath forwarding,
> both IPv4 and IPv6 code currently fall back on
> flow_hash_from_keys(). That uses a randomly-generated seed. That's a
> fine choice by default, but unfortunately some deployments may need
> a tighter control over the seed used.
>
> In this patchset, make the seed configurable by adding a new sysctl
> key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
> is used specifically for multipath forwarding and not for the other
> concerns that flow_hash_from_keys() is used for, such as queue
> selection. Expose the knob as sysctl because other such settings,
> such as headers to hash, are also handled that way.
>
> Despite being placed in the net.ipv4 namespace, the multipath seed
> sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
> TCP variables. Like those, the multipath hash seed is a per-netns
> variable.
>
> The seed used by flow_hash_from_keys() is a 128-bit quantity.
> However it seems that usually the seed is a much more modest value.
> 32 bits seem typical (Cisco, Cumulus), some systems go even lower.
> For that reason, and to decouple the user interface from
> implementation details, go with a 32-bit quantity, which is then
> quadruplicated to form the siphash key.
>
> One example of use of this interface is avoiding hash polarization,
> where two ECMP routers, one behind the other, happen to make consistent
> hashing decisions, and as a result, part of the ECMP space of the latter
> router is never used. Another is a load balancer where several machines
> forward traffic to one of a number of leaves, and the forwarding
> decisions need to be made consistently. (This is a case of a desired
> hash polarization, mentioned e.g. in chapter 6.3 of [0].)
>
> There has already been a proposal to include a hash seed control
> interface in the past[1].
>
> - Patches #1-#2 contain the substance of the work
> - Patch #3 is an mlxsw offload
> - Patches #4 and #5 are a selftest
>
> [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
> [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
>
> v2:
> - Patch #2:
> - Instead of precomputing the siphash key, construct it in place
> of use thus obviating the need to use RCU.
> - Instead of dispatching to the flow dissector for cases where
> user seed is 0, maintain a separate random seed. Initialize it
> early so that we can avoid a branch at the seed reader.
> - In documentation, s/only valid/only present/ (when
> CONFIG_IP_ROUTE_MULTIPATH). Also mention the algorithm is
> unspecified and unstable in principle.
> - Patch #3:
> - Update to match changes in patch #2.
> - Patch #4:
> - New patch.
> - Patch #5:
> - Do not set seed on test init and run the stability tests first to catch
> the cases of a missed pernet seed init.
>
> Petr Machata (5):
> net: ipv4,ipv6: Pass multipath hash computation through a helper
> net: ipv4: Add a sysctl to set multipath hash seed
> mlxsw: spectrum_router: Apply user-defined multipath hash seed
> selftests: forwarding: lib: Split sysctl_save() out of sysctl_set()
> selftests: forwarding: router_mpath_hash: Add a new selftest
>
> Documentation/networking/ip-sysctl.rst | 14 +
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 6 +-
> include/net/flow_dissector.h | 2 +
> include/net/ip_fib.h | 28 ++
> include/net/netns/ipv4.h | 8 +
> net/core/flow_dissector.c | 7 +
> net/ipv4/route.c | 12 +-
> net/ipv4/sysctl_net_ipv4.c | 66 ++++
> net/ipv6/route.c | 12 +-
> .../testing/selftests/net/forwarding/Makefile | 1 +
> tools/testing/selftests/net/forwarding/lib.sh | 9 +-
> .../net/forwarding/router_mpath_seed.sh | 333 ++++++++++++++++++
> 12 files changed, 484 insertions(+), 14 deletions(-)
> create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
>
Looks good to me, thank you!
For the set,
Reviewed-by: Nikolay Aleksandrov <razor@...ckwall.org>
Powered by blists - more mailing lists