lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ee4c868-0bd4-6df2-aaef-07efed2812f6@gmail.com>
Date:   Thu, 28 Jan 2021 20:24:33 -0700
From:   David Ahern <dsahern@...il.com>
To:     Petr Machata <petrm@...dia.com>, netdev@...r.kernel.org
Cc:     David Ahern <dsahern@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Ido Schimmel <idosch@...dia.com>
Subject: Re: [PATCH net-next 00/12] nexthop: Preparations for resilient
 next-hop groups

On 1/28/21 5:49 AM, Petr Machata wrote:
> At this moment, there is only one type of next-hop group: an mpath group.
> Mpath groups implement the hash-threshold algorithm, described in RFC
> 2992[1].
> 
> To select a next hop, hash-threshold algorithm first assigns a range of
> hashes to each next hop in the group, and then selects the next hop by
> comparing the SKB hash with the individual ranges. When a next hop is
> removed from the group, the ranges are recomputed, which leads to
> reassignment of parts of hash space from one next hop to another. RFC 2992
> illustrates it thus:
> 
>              +-------+-------+-------+-------+-------+
>              |   1   |   2   |   3   |   4   |   5   |
>              +-------+-+-----+---+---+-----+-+-------+
>              |    1    |    2    |    4    |    5    |
>              +---------+---------+---------+---------+
> 
>               Before and after deletion of next hop 3
> 	      under the hash-threshold algorithm.
> 
> Note how next hop 2 gave up part of the hash space in favor of next hop 1,
> and 4 in favor of 5. While there will usually be some overlap between the
> previous and the new distribution, some traffic flows change the next hop
> that they resolve to.
> 
> If a multipath group is used for load-balancing between multiple servers,
> this hash space reassignment causes an issue that packets from a single
> flow suddenly end up arriving at a server that does not expect them, which
> may lead to TCP reset.
> 
> If a multipath group is used for load-balancing among available paths to
> the same server, the issue is that different latencies and reordering along
> the way causes the packets to arrive in wrong order.
> 
> Resilient hashing is a technique to address the above problem. Resilient
> next-hop group has another layer of indirection between the group itself
> and its constituent next hops: a hash table. The selection algorithm uses a
> straightforward modulo operation to choose a hash bucket, and then reads
> the next hop that this bucket contains, and forwards traffic there.
> 
> This indirection brings an important feature. In the hash-threshold
> algorithm, the range of hashes associated with a next hop must be
> continuous. With a hash table, mapping between the hash table buckets and
> the individual next hops is arbitrary. Therefore when a next hop is deleted
> the buckets that held it are simply reassigned to other next hops:
> 
>              +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>              |1|1|1|1|2|2|2|2|3|3|3|3|4|4|4|4|5|5|5|5|
>              +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 	                      v v v v
>              +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>              |1|1|1|1|2|2|2|2|1|2|4|5|4|4|4|4|5|5|5|5|
>              +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 
>               Before and after deletion of next hop 3
> 	      under the resilient hashing algorithm.
> 
> When weights of next hops in a group are altered, it may be possible to
> choose a subset of buckets that are currently not used for forwarding
> traffic, and use those to satisfy the new next-hop distribution demands,
> keeping the "busy" buckets intact. This way, established flows are ideally
> kept being forwarded to the same endpoints through the same paths as before
> the next-hop group change.
> 
> This patchset prepares the next-hop code for eventual introduction of
> resilient hashing groups.
> 
> - Patches #1-#4 carry otherwise disjoint changes that just remove certain
>   assumptions in the next-hop code.
> 
> - Patches #5-#6 extend the in-kernel next-hop notifiers to support more
>   next-hop group types.
> 
> - Patches #7-#12 refactor RTNL message handlers. Resilient next-hop groups
>   will introduce a new logical object, a hash table bucket. It turns out
>   that handling bucket-related messages is similar to how next-hop messages
>   are handled. These patches extract the commonalities into reusable
>   components.
> 
> The plan is to contribute approximately the following patchsets:
> 
> 1) Nexthop policy refactoring (already pushed)
> 2) Preparations for resilient next hop groups (this patchset)
> 3) Implementation of resilient next hop group
> 4) Netdevsim offload plus a suite of selftests
> 5) Preparations for mlxsw offload of resilient next-hop groups
> 6) mlxsw offload including selftests
> 
> Interested parties can look at the current state of the code at [2] and
> [3].
> 
> [1] https://tools.ietf.org/html/rfc2992
> [2] https://github.com/idosch/linux/commits/submit/res_integ_v1
> [3] https://github.com/idosch/iproute2/commits/submit/res_v1
> 

Very easy to review patchset. Thank you for that and for this cover
letter with the end goal and progress.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ