netdev - Re: [RFC net 0/6] hsr: Implement more robust duplicate discard algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ad56f0fa-df33-45b8-aae5-7efac963ce2c@redhat.com>
Date: Thu, 8 Jan 2026 14:44:02 +0100
From: Felix Maurer <fmaurer@...hat.com>
To: Simon Horman <horms@...nel.org>
Cc: netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com, jkarrenpalo@...il.com,
 tglx@...utronix.de, mingo@...nel.org, allison.henderson@...cle.com,
 matttbe@...nel.org, petrm@...dia.com, bigeasy@...utronix.de
Subject: Re: [RFC net 0/6] hsr: Implement more robust duplicate discard
 algorithm

On 06.01.26 15:30, Simon Horman wrote:
> On Mon, Dec 22, 2025 at 09:57:30PM +0100, Felix Maurer wrote:
>> The PRP duplicate discard algorithm does not work reliably with certain
>> link faults. Especially with packet loss on one link, the duplicate
>> discard algorithm drops valid packets. For a more thorough description
>> see patch 5.
>>
>> My suggestion is to replace the current, drop window-based algorithm
>> with a new one that tracks the received sequence numbers individually
>> (description again in patch 5). I am sending this as an RFC to gather
>> feedback mainly on two points:
>>
>> 1. Is the design generally acceptable? Of course, this change leads to
>>    higher memory usage and more work to do for each packet. But I argue
>>    that this is an acceptable trade-off to make for a more robust PRP
>>    behavior with faulty links. After all, PRP is to be used in
>>    environments where redundancy is needed and people are ready to
>>    maintain two duplicate networks to achieve it.
>> 2. As the tests added in patch 6 show, HSR is subject to similar
>>    problems. I do not see a reason not to use a very similar algorithm
>>    for HSR as well (with a bitmap for each port). Any objections to
>>    doing that (in a later patch series)? This will make the trade-off
>>    with memory usage more pronounced, as the hsr_seq_block will grow by
>>    three more bitmaps, at least for each HSR node (of which we do not
>>    expect too many, as an HSR ring can not be infinitely large).
> 
> Hi Felix,
> 
> Happy New Year!
> 
> We have spoken about this offline before and I agree that the situation
> should be improved.
> 
> IMHO the trade-offs you are making here seem reasonable.  And I wonder if
> it helps to think in terms of the expected usage of this code: Is it
> expected to scale to a point where the memory and CPU overhead becomes
> unreasonable; or do, as I think you imply above, we expect deployments to
> be on systems where the trade-offs are acceptable?

Happy new year to you as well!

As for the expected scale, there are two dimensions: the number of nodes
in the network and the data rate with which they send.

The number of nodes in the network affect the memory usage because each
node now has the block buffer. For PRP that's 64 blocks * 32 byte =
2kbyte for each node in the node table. A PRP network doesn't have an
explicit limit for the number of nodes. However, the whole network is a
single layer-2 segment which shouldn't grow too large anyways. Even if
one really tries to put 1000 nodes into the PRP network, the memory
overhead (2Mbyte) is acceptable in my opinion.

For HSR, the blocks would be larger because we need to track the
sequence numbers per port. I expect 64 blocks * 80 byte = 5kbyte per
node in the node table. There is no explicit limit for the size of an
HSR ring either. But I expect them to be of limited size because the
forwarding delays add up throughout the ring. I've seen vendors limiting
the ring size to 50 nodes with 100Mbit/s links and 300 with 1Gbit/s
links. In both cases I consider the memory overhead acceptable.

The data rates are harder to reason about. In general, the data rates
for HSR and PRP are limited because too high packet rates would lead to
very fast re-use of the 16bit sequence numbers. The IEC 62439-3:2021
mentions 100Mbit/s links and 1Gbit/s links. I don't expect HSR or PRP
networks to scale out to, e.g., 10Gbit/s links with the current
specification as this would mean that sequence numbers could repeat as
often as every ~4ms. The default constants in the IEC standard, which we
also use, are oriented at a 100Mbit/s network.

In my tests with veth pairs, the CPU overhead didn't lead to
significantly lower data rates. The main factor limiting the data rate
at the moment, I assume, is the per-node spinlock that is taken for each
received packet. IMHO, there is a lot more to gain in terms of CPU
overhead from making this lock smaller or getting rid of it, than we
loose with the more accurate duplicate discard algorithm in this patchset.

The CPU overhead of the algorithm benefits from the fact that in high
packet rate scenarios (where it really matters) many packets will have
sequence numbers in already initialized blocks. These packets just have
additionally: one xarray lookup, one comparison, and one bit setting. If
a block needs to be initialized (once every 128 packets plus their 128
duplicates if all sequence numbers are seen), we will have: one
xa_erase, a bunch of memory writes, and one xa_store.

In theory, all packets could end up in the slow path if a node sends
every 128th packet to us. If this is sent from a well behaving node, the
packet rate wouldn't be an issue anymore, though.

>> Most of the patches in this series are for the selftests. This is mainly
>> to demonstrate the problems with the current duplicate discard
>> algorithms, not so much about gathering feedback. Especially patch 1 and
>> 2 are rather preparatory cleanups that do not have much to do with the
>> actual problems the new algorithm tries to solve.
>>
>> A few points I know not yet addressed are:
>> - HSR duplicate discard (see above).
>> - The KUnit test is not updated for the new algorithm. I will work on
>>   that before actual patch submission.
> 
> FTR, the KUnit tests no longer compiles. But probably you already knew that.
> 
>> - Merging the sequence number blocks when two entries in the node table
>>   are merged because they belong to the same node.
>>
>> Thank you for your feedback already!
> 
> Some slightly more specific feedback:
> 
> * These patches are probably for net-next rather than net

They are, that was just a mistake when I sent them.

> * Please run checkpatch.pl --max-line-length=80 --codespell (on each patch)
>   - And fix the line lengths where it doesn't reduce readability.
>     E.g. don't split strings
> 
> * Please also run shellcheck on the selftests
>   - As much as is reasonable please address the warnings
>   - In general new .sh files should be shellcheck-clean
>   - To aid this, use "# shellcheck disable=CASE", for cases that don't match
>     the way selftests are written , e.g. SC2154 and SC2034
> 
> * I was curious to see LANG=C in at least one of the selftests.
>   And I do see limited precedence for that. I'm just mentioning
>   that I was surprised as I'd always thought it was an implied requirement.

Thanks for the feedback, I'll work on it for the next submission. The
LANG=C was in another hsr selftest, it can probably be removed. I agree
that probably a lot of selftests fail with any other LANG.

Thanks,
   Felix