[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV0chBkc20PCn-Is@horms.kernel.org>
Date: Tue, 6 Jan 2026 14:30:28 +0000
From: Simon Horman <horms@...nel.org>
To: Felix Maurer <fmaurer@...hat.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, jkarrenpalo@...il.com,
tglx@...utronix.de, mingo@...nel.org, allison.henderson@...cle.com,
matttbe@...nel.org, petrm@...dia.com, bigeasy@...utronix.de
Subject: Re: [RFC net 0/6] hsr: Implement more robust duplicate discard
algorithm
On Mon, Dec 22, 2025 at 09:57:30PM +0100, Felix Maurer wrote:
> The PRP duplicate discard algorithm does not work reliably with certain
> link faults. Especially with packet loss on one link, the duplicate
> discard algorithm drops valid packets. For a more thorough description
> see patch 5.
>
> My suggestion is to replace the current, drop window-based algorithm
> with a new one that tracks the received sequence numbers individually
> (description again in patch 5). I am sending this as an RFC to gather
> feedback mainly on two points:
>
> 1. Is the design generally acceptable? Of course, this change leads to
> higher memory usage and more work to do for each packet. But I argue
> that this is an acceptable trade-off to make for a more robust PRP
> behavior with faulty links. After all, PRP is to be used in
> environments where redundancy is needed and people are ready to
> maintain two duplicate networks to achieve it.
> 2. As the tests added in patch 6 show, HSR is subject to similar
> problems. I do not see a reason not to use a very similar algorithm
> for HSR as well (with a bitmap for each port). Any objections to
> doing that (in a later patch series)? This will make the trade-off
> with memory usage more pronounced, as the hsr_seq_block will grow by
> three more bitmaps, at least for each HSR node (of which we do not
> expect too many, as an HSR ring can not be infinitely large).
Hi Felix,
Happy New Year!
We have spoken about this offline before and I agree that the situation
should be improved.
IMHO the trade-offs you are making here seem reasonable. And I wonder if
it helps to think in terms of the expected usage of this code: Is it
expected to scale to a point where the memory and CPU overhead becomes
unreasonable; or do, as I think you imply above, we expect deployments to
be on systems where the trade-offs are acceptable?
>
> Most of the patches in this series are for the selftests. This is mainly
> to demonstrate the problems with the current duplicate discard
> algorithms, not so much about gathering feedback. Especially patch 1 and
> 2 are rather preparatory cleanups that do not have much to do with the
> actual problems the new algorithm tries to solve.
>
> A few points I know not yet addressed are:
> - HSR duplicate discard (see above).
> - The KUnit test is not updated for the new algorithm. I will work on
> that before actual patch submission.
FTR, the KUnit tests no longer compiles. But probably you already knew that.
> - Merging the sequence number blocks when two entries in the node table
> are merged because they belong to the same node.
>
> Thank you for your feedback already!
Some slightly more specific feedback:
* These patches are probably for net-next rather than net
* Please run checkpatch.pl --max-line-length=80 --codespell (on each patch)
- And fix the line lengths where it doesn't reduce readability.
E.g. don't split strings
* Please also run shellcheck on the selftests
- As much as is reasonable please address the warnings
- In general new .sh files should be shellcheck-clean
- To aid this, use "# shellcheck disable=CASE", for cases that don't match
the way selftests are written , e.g. SC2154 and SC2034
* I was curious to see LANG=C in at least one of the selftests.
And I do see limited precedence for that. I'm just mentioning
that I was surprised as I'd always thought it was an implied requirement.
Powered by blists - more mailing lists