[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1769001553.git.fmaurer@redhat.com>
Date: Wed, 21 Jan 2026 14:32:59 +0100
From: Felix Maurer <fmaurer@...hat.com>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
jkarrenpalo@...il.com,
tglx@...utronix.de,
mingo@...nel.org,
allison.henderson@...cle.com,
petrm@...dia.com,
antonio@...nvpn.net,
bigeasy@...utronix.de
Subject: [PATCH net-next 0/9] hsr: Implement more robust duplicate discard algorithm
The duplicate discard algorithms for PRP and HSR do not work reliably
with certain link faults. Especially with packet loss on one link, the
duplicate discard algorithms drop valid packets. For a more thorough
description see patches 4 (for PRP) and 6 (for HSR).
This patchset replaces the current algorithms (based on a drop window
for PRP and highest seen sequence number for HSR) with a single new one
that tracks the received sequence numbers individually (descriptions
again in patches 4 and 6).
The changes will lead to higher memory usage and more work to do for
each packet. But I argue that this is an acceptable trade-off to make
for a more robust PRP and HSR behavior with faulty links. After all,
both protocols are to be used in environments where redundancy is needed
and people are willing to setup special network topologies to achieve
that.
Some more reasoning on the overhead and expected scale of the deployment
from the RFC discussion:
> As for the expected scale, there are two dimensions: the number of nodes
> in the network and the data rate with which they send.
>
> The number of nodes in the network affect the memory usage because each
> node now has the block buffer. For PRP that's 64 blocks * 32 byte =
> 2kbyte for each node in the node table. A PRP network doesn't have an
> explicit limit for the number of nodes. However, the whole network is a
> single layer-2 segment which shouldn't grow too large anyways. Even if
> one really tries to put 1000 nodes into the PRP network, the memory
> overhead (2Mbyte) is acceptable in my opinion.
>
> For HSR, the blocks would be larger because we need to track the
> sequence numbers per port. I expect 64 blocks * 80 byte = 5kbyte per
> node in the node table. There is no explicit limit for the size of an
> HSR ring either. But I expect them to be of limited size because the
> forwarding delays add up throughout the ring. I've seen vendors limiting
> the ring size to 50 nodes with 100Mbit/s links and 300 with 1Gbit/s
> links. In both cases I consider the memory overhead acceptable.
>
> The data rates are harder to reason about. In general, the data rates
> for HSR and PRP are limited because too high packet rates would lead to
> very fast re-use of the 16bit sequence numbers. The IEC 62439-3:2021
> mentions 100Mbit/s links and 1Gbit/s links. I don't expect HSR or PRP
> networks to scale out to, e.g., 10Gbit/s links with the current
> specification as this would mean that sequence numbers could repeat as
> often as every ~4ms. The default constants in the IEC standard, which we
> also use, are oriented at a 100Mbit/s network.
>
> In my tests with veth pairs, the CPU overhead didn't lead to
> significantly lower data rates. The main factor limiting the data rate
> at the moment, I assume, is the per-node spinlock that is taken for each
> received packet. IMHO, there is a lot more to gain in terms of CPU
> overhead from making this lock smaller or getting rid of it, than we
> loose with the more accurate duplicate discard algorithm in this patchset.
>
> The CPU overhead of the algorithm benefits from the fact that in high
> packet rate scenarios (where it really matters) many packets will have
> sequence numbers in already initialized blocks. These packets just have
> additionally: one xarray lookup, one comparison, and one bit setting. If
> a block needs to be initialized (once every 128 packets plus their 128
> duplicates if all sequence numbers are seen), we will have: one
> xa_erase, a bunch of memory writes, and one xa_store.
>
> In theory, all packets could end up in the slow path if a node sends
> every 128th packet to us. If this is sent from a well behaving node, the
> packet rate wouldn't be an issue anymore, though.
Thanks,
Felix
Signed-off-by: Felix Maurer <fmaurer@...hat.com>
---
Changes since the RFC:
- link: https://lore.kernel.org/netdev/cover.1766433800.git.fmaurer@redhat.com/
- Extended the new algorithm to HSR
- shellcheck'ing and checkpatch'ing
- Updated the KUnit test
Felix Maurer (9):
selftests: hsr: Add ping test for PRP
selftests: hsr: Check duplicates on HSR with VLAN
selftests: hsr: Add tests for faulty links
hsr: Implement more robust duplicate discard for PRP
selftests: hsr: Add tests for more link faults with PRP
hsr: Implement more robust duplicate discard for HSR
selftests: hsr: Add more link fault tests for HSR
hsr: Update PRP duplicate discard KUnit test for new algorithm
MAINTAINERS: Assign hsr selftests to HSR
MAINTAINERS | 1 +
net/hsr/hsr_framereg.c | 357 ++++++++++-------
net/hsr/hsr_framereg.h | 39 +-
net/hsr/prp_dup_discard_test.c | 154 ++++---
tools/testing/selftests/net/hsr/Makefile | 2 +
tools/testing/selftests/net/hsr/hsr_ping.sh | 207 +++-------
.../testing/selftests/net/hsr/link_faults.sh | 376 ++++++++++++++++++
tools/testing/selftests/net/hsr/prp_ping.sh | 146 +++++++
tools/testing/selftests/net/hsr/settings | 2 +-
9 files changed, 903 insertions(+), 381 deletions(-)
create mode 100755 tools/testing/selftests/net/hsr/link_faults.sh
create mode 100755 tools/testing/selftests/net/hsr/prp_ping.sh
--
2.52.0
Powered by blists - more mailing lists