[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200802144959.GA2483264@shredder>
Date: Sun, 2 Aug 2020 17:49:59 +0300
From: Ido Schimmel <idosch@...sch.org>
To: David Ahern <dsahern@...il.com>
Cc: Yi Yang (杨燚)-云服务集团
<yangyi01@...pur.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"nikolay@...ulusnetworks.com" <nikolay@...ulusnetworks.com>
Subject: Re: 答复: [PATCH] can current ECMP implementation support consistent hashing for next hop?
On Thu, Jun 11, 2020 at 10:36:59PM -0600, David Ahern wrote:
> On 6/11/20 6:32 PM, Yi Yang (杨燚)-云服务集团 wrote:
> > David, thank you so much for confirming it can't, I did read your cumulus document before, resilient hashing is ok for next hop remove, but it still has the same issue there if add new next hop. I know most of kernel code in Cumulus Linux has been in upstream kernel, I'm wondering why you didn't push resilient hashing to upstream kernel.
> >
> > I think consistent hashing is must-have for a commercial load balancing solution, otherwise it is basically nonsense , do you Cumulus Linux have consistent hashing solution?
> >
> > Is "- replacing nexthop entries as LB's come and go" ithe stuff https://docs.cumulusnetworks.com/cumulus-linux/Layer-3/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#resilient-hashing is showing? It can't ensure the flow is distributed to the right backend server if a new next hop is added.
>
> I do not believe it is a problem to be solved in the kernel.
>
> If you follow the *intent* of the Cumulus document: what is the maximum
> number of load balancers you expect to have? 16? 32? 64? Define an ECMP
> route with that number of nexthops and fill in the weighting that meets
> your needs. When an LB is added or removed, you decide what the new set
> of paths is that maintains N-total paths with the distribution that
> meets your needs.
I recently started looking into consistent hashing and I wonder if it
can be done with the new nexthop API while keeping all the logic in user
space (e.g., FRR).
The only extension that might be required from the kernel is a new
nexthop attribute that indicates when a nexthop was last recently used.
User space can then use it to understand which nexthops to replace when
a new nexthop is added and when to perform the replacement. In case the
nexthops are offloaded, it is possible for the driver to periodically
update the nexthop code about their activity.
Below is a script that demonstrates the concept with the example in the
Cumulus documentation. I chose to replace the individual nexthops
instead of creating new ones and then replacing the group.
It is obviously possible to create larger groups to reduce the impact on
existing flows when a new nexthop is added.
WDYT?
```
#!/bin/bash
### Setup ####
IP="ip -n testns"
ip netns add testns
$IP link add name dummy_a up type dummy
$IP link add name dummy_b up type dummy
$IP link add name dummy_c up type dummy
$IP link add name dummy_d up type dummy
$IP link add name dummy_e up type dummy
$IP route add 1.1.1.0/24 dev dummy_a
$IP route add 2.2.2.0/24 dev dummy_b
$IP route add 3.3.3.0/24 dev dummy_c
$IP route add 4.4.4.0/24 dev dummy_d
$IP route add 5.5.5.0/24 dev dummy_e
### Initial nexthop configuration ####
# According to:
# https://docs.cumulusnetworks.com/cumulus-linux-42/Layer-3/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#resilient-hash-buckets
$IP nexthop replace id 1 via 1.1.1.1 dev dummy_a
$IP nexthop replace id 2 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 3 via 3.3.3.3 dev dummy_c
$IP nexthop replace id 4 via 4.4.4.4 dev dummy_d
$IP nexthop replace id 5 via 1.1.1.1 dev dummy_a
$IP nexthop replace id 6 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 7 via 3.3.3.3 dev dummy_c
$IP nexthop replace id 8 via 4.4.4.4 dev dummy_d
$IP nexthop replace id 9 via 1.1.1.1 dev dummy_a
$IP nexthop replace id 10 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 11 via 3.3.3.3 dev dummy_c
$IP nexthop replace id 12 via 4.4.4.4 dev dummy_d
$IP nexthop replace id 10000 group 1/2/3/4/5/6/7/8/9/10/11/12
echo
echo "Initial state:"
echo
$IP nexthop show
### Nexthop B is removed ###
# According to:
# https://docs.cumulusnetworks.com/cumulus-linux-42/Layer-3/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#remove-next-hops
$IP nexthop replace id 2 via 1.1.1.1 dev dummy_a
$IP nexthop replace id 6 via 3.3.3.3 dev dummy_c
$IP nexthop replace id 10 via 4.4.4.4 dev dummy_d
echo
echo "After nexthop B was removed:"
echo
$IP nexthop show
### Initial state restored ####
$IP nexthop replace id 2 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 6 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 10 via 2.2.2.2 dev dummy_b
echo
echo "After intial state was restored:"
echo
$IP nexthop show
### Nexthop E is added ####
# According to:
# https://docs.cumulusnetworks.com/cumulus-linux-42/Layer-3/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#add-next-hops
# Nexthop 2, 5, 8 are active. Replace in a way that minimizes
# interruptions.
$IP nexthop replace id 1 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 2 via 3.3.3.3 dev dummy_c
$IP nexthop replace id 3 via 4.4.4.4 dev dummy_d
$IP nexthop replace id 4 via 5.5.5.5 dev dummy_e
# Nexthop 5 remains the same
# Nexthop 6 remains the same
# Nexthop 7 remains the same
# Nexthop 8 remains the same
$IP nexthop replace id 9 via 5.5.5.5 dev dummy_e
$IP nexthop replace id 10 via 1.1.1.1 dev dummy_a
$IP nexthop replace id 11 via 2.2.2.2 dev dummy_b
$IP nexthop replace id 12 via 3.3.3.3 dev dummy_c
echo
echo "After nexthop E was added:"
echo
$IP nexthop show
ip netns del testns
```
>
> I just sent patches for active-backup nexthops that allows an automatic
> fallback when one is removed to address the redistribution problem, but
> it still requires userspace to decide what the active-backup pairs are
> as well as the maximum number of paths.
Powered by blists - more mailing lists