netdev - Re: [RFC net 0/6] hsr: Implement more robust duplicate discard algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aV0chBkc20PCn-Is@horms.kernel.org>
Date: Tue, 6 Jan 2026 14:30:28 +0000
From: Simon Horman <horms@...nel.org>
To: Felix Maurer <fmaurer@...hat.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com, jkarrenpalo@...il.com,
	tglx@...utronix.de, mingo@...nel.org, allison.henderson@...cle.com,
	matttbe@...nel.org, petrm@...dia.com, bigeasy@...utronix.de
Subject: Re: [RFC net 0/6] hsr: Implement more robust duplicate discard
 algorithm

On Mon, Dec 22, 2025 at 09:57:30PM +0100, Felix Maurer wrote:
> The PRP duplicate discard algorithm does not work reliably with certain
> link faults. Especially with packet loss on one link, the duplicate
> discard algorithm drops valid packets. For a more thorough description
> see patch 5.
> 
> My suggestion is to replace the current, drop window-based algorithm
> with a new one that tracks the received sequence numbers individually
> (description again in patch 5). I am sending this as an RFC to gather
> feedback mainly on two points:
> 
> 1. Is the design generally acceptable? Of course, this change leads to
>    higher memory usage and more work to do for each packet. But I argue
>    that this is an acceptable trade-off to make for a more robust PRP
>    behavior with faulty links. After all, PRP is to be used in
>    environments where redundancy is needed and people are ready to
>    maintain two duplicate networks to achieve it.
> 2. As the tests added in patch 6 show, HSR is subject to similar
>    problems. I do not see a reason not to use a very similar algorithm
>    for HSR as well (with a bitmap for each port). Any objections to
>    doing that (in a later patch series)? This will make the trade-off
>    with memory usage more pronounced, as the hsr_seq_block will grow by
>    three more bitmaps, at least for each HSR node (of which we do not
>    expect too many, as an HSR ring can not be infinitely large).

Hi Felix,

Happy New Year!

We have spoken about this offline before and I agree that the situation
should be improved.

IMHO the trade-offs you are making here seem reasonable.  And I wonder if
it helps to think in terms of the expected usage of this code: Is it
expected to scale to a point where the memory and CPU overhead becomes
unreasonable; or do, as I think you imply above, we expect deployments to
be on systems where the trade-offs are acceptable?

> 
> Most of the patches in this series are for the selftests. This is mainly
> to demonstrate the problems with the current duplicate discard
> algorithms, not so much about gathering feedback. Especially patch 1 and
> 2 are rather preparatory cleanups that do not have much to do with the
> actual problems the new algorithm tries to solve.
> 
> A few points I know not yet addressed are:
> - HSR duplicate discard (see above).
> - The KUnit test is not updated for the new algorithm. I will work on
>   that before actual patch submission.

FTR, the KUnit tests no longer compiles. But probably you already knew that.

> - Merging the sequence number blocks when two entries in the node table
>   are merged because they belong to the same node.
> 
> Thank you for your feedback already!

Some slightly more specific feedback:

* These patches are probably for net-next rather than net

* Please run checkpatch.pl --max-line-length=80 --codespell (on each patch)
  - And fix the line lengths where it doesn't reduce readability.
    E.g. don't split strings

* Please also run shellcheck on the selftests
  - As much as is reasonable please address the warnings
  - In general new .sh files should be shellcheck-clean
  - To aid this, use "# shellcheck disable=CASE", for cases that don't match
    the way selftests are written , e.g. SC2154 and SC2034

* I was curious to see LANG=C in at least one of the selftests.
  And I do see limited precedence for that. I'm just mentioning
  that I was surprised as I'd always thought it was an implied requirement.