[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878d1248-a710-4b02-b9c7-70937328c939@blackwall.org>
Date: Wed, 29 May 2024 22:57:29 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Petr Machata <petrm@...dia.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: Ido Schimmel <idosch@...dia.com>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
On 5/29/24 14:18, Petr Machata wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
>
> When calculating hashes for the purpose of multipath forwarding,
> both IPv4 and IPv6 code currently fall back on
> flow_hash_from_keys(). That uses a randomly-generated seed. That's a
> fine choice by default, but unfortunately some deployments may need
> a tighter control over the seed used.
>
> In this patchset, make the seed configurable by adding a new sysctl
> key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
> is used specifically for multipath forwarding and not for the other
> concerns that flow_hash_from_keys() is used for, such as queue
> selection. Expose the knob as sysctl because other such settings,
> such as headers to hash, are also handled that way.
>
> Despite being placed in the net.ipv4 namespace, the multipath seed
> sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
> TCP variables. Like those, the multipath hash seed is a per-netns
> variable.
>
> The new sysctl is added with permissions 0600 so that the hash is
> only readable and writable by root.
>
> The seed used by flow_hash_from_keys() is a 128-bit quantity.
> However it seems that usually the seed is a much more modest value.
> 32 bits seem typical (Cisco, Cumulus), some systems go even lower.
> For that reason, and to decouple the user interface from
> implementation details, go with a 32-bit quantity, which is then
> quadruplicated to form the siphash key.
>
> One example of use of this interface is avoiding hash polarization,
> where two ECMP routers, one behind the other, happen to make consistent
> hashing decisions, and as a result, part of the ECMP space of the latter
> router is never used. Another is a load balancer where several machines
> forward traffic to one of a number of leaves, and the forwarding
> decisions need to be made consistently. (This is a case of a desired
> hash polarization, mentioned e.g. in chapter 6.3 of [0].)
>
> There has already been a proposal to include a hash seed control
> interface in the past[1]. This patchset uses broadly the same ideas, but
> limits the externally visible seed size to 32 bits.
>
> - Patches #1-#2 contain the substance of the work
> - Patch #3 is a mlxsw offload
> - Patch #4 is a selftest
>
> [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
> [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
>
> Petr Machata (4):
> net: ipv4,ipv6: Pass multipath hash computation through a helper
> net: ipv4: Add a sysctl to set multipath hash seed
> mlxsw: spectrum_router: Apply user-defined multipath hash seed
> selftests: forwarding: router_mpath_hash: Add a new selftest
>
> Documentation/networking/ip-sysctl.rst | 10 +
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 14 +-
> include/net/flow_dissector.h | 2 +
> include/net/ip_fib.h | 24 ++
> include/net/netns/ipv4.h | 10 +
> net/core/flow_dissector.c | 7 +
> net/ipv4/route.c | 12 +-
> net/ipv4/sysctl_net_ipv4.c | 82 +++++
> net/ipv6/route.c | 12 +-
> .../testing/selftests/net/forwarding/Makefile | 1 +
> .../net/forwarding/router_mpath_seed.sh | 322 ++++++++++++++++++
> 11 files changed, 482 insertions(+), 14 deletions(-)
> create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
>
Hi,
I think that using memory management for such simple task is an
overkill. Would it be simpler to define 2 x 4 byte seed variables
in netns_ipv4 (e.g. user_seed, mp_seed). One is set only by the
user through the sysctl, which would also set mp_seed. Then you
can use mp_seed in the fast-path to construct that siphash key.
If the user_seed is set to 0 then you reset to some static init
hash value that is generated on init_net's creation. The idea
is to avoid leaking that initial seed, to have the same seed
for all netns (known behaviour), be able to recognize when a
seed was set and if the user sets a seed then overwrite it for
that ns, but to be able to reset it as well.
Since 32 bits are enough I don't see why we should be using
the flow hash seed, note that init_net's initialization already
uses get_random_bytes() for hashes. This seems like a simpler
scheme that doesn't require memory management for a 32 bit seed.
Also it has the benefit that it will remove the test when generating
a hash because in the initial/non-user-set case we just have the
initial seed in mp_seed which is used to generate the siphash key,
i.e. we always use that internal seed for the hash, regardless if
it was set by the user or it's the initial seed.
That's just one suggestion, if you decide to use more memory you
can keep the whole key in netns_ipv4 instead, the point is I don't
think we need memory management for this value.
Cheers,
Nik
Powered by blists - more mailing lists