[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccad3f52-e46a-46b5-9113-153a31b3ddd3@redhat.com>
Date: Thu, 5 Feb 2026 13:25:01 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Felix Maurer <fmaurer@...hat.com>, netdev@...r.kernel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
horms@...nel.org, jkarrenpalo@...il.com, tglx@...nel.org, mingo@...nel.org,
bigeasy@...utronix.de, matttbe@...nel.org, allison.henderson@...cle.com,
petrm@...dia.com, antonio@...nvpn.net,
Steffen Lindner <steffen.lindner@...abb.com>
Subject: Re: [PATCH net-next v3 4/8] hsr: Implement more robust duplicate
discard for PRP
On 2/2/26 3:19 PM, Felix Maurer wrote:
> The PRP duplicate discard algorithm does not work reliably with certain
> link faults. Especially with packet loss on one link, the duplicate discard
> algorithm drops valid packets which leads to packet loss on the PRP
> interface where the link fault should in theory be perfectly recoverable by
> PRP. This happens because the algorithm opens the drop window on the lossy
> link, covering received and lost sequence numbers. If the other, non-lossy
> link receives the duplicate for a lost frame, it is within the drop window
> of the lossy link and therefore dropped.
>
> Since IEC 62439-3:2012, a node has one sequence number counter for frames
> it sends, instead of one sequence number counter for each destination.
> Therefore, a node can not expect to receive contiguous sequence numbers
> from a sender. A missing sequence number can be totally normal (if the
> sender intermittently communicates with another node) or mean a frame was
> lost.
>
> The algorithm, as previously implemented in commit 05fd00e5e7b1 ("net: hsr:
> Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0)
> but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no
> algorithm is specified but up to implementers. It should be "designed such
> that it never rejects a legitimate frame, while occasional acceptance of a
> duplicate can be tolerated" (IEC 62439-3:2021).
>
> For the duplicate discard algorithm, this means that 1) we need to track
> the sequence numbers individually to account for non-contiguous sequence
> numbers, and 2) we should always err on the side of accepting a duplicate
> than dropping a valid frame.
>
> The idea of the new algorithm is to store the seen sequence numbers in a
> bitmap. To keep the size of the bitmap in control, we store it as a "sparse
> bitmap" where the bitmap is split into blocks and not all blocks exist at
> the same time. The sparse bitmap is implemented using an xarray that keeps
> the references to the individual blocks and a backing ring buffer that
> stores the actual blocks. New blocks are initialized in the buffer and
> added to the xarray as needed when new frames arrive. Existing blocks are
> removed in two conditions:
> 1. The block found for an arriving sequence number is old and therefore not
> relevant to the duplicate discard algorithm anymore, i.e., it has been
> added more than the entry forget time ago. In this case, the block is
> removed from the xarray and marked as forgotten (by setting its
> timestamp to 0).
> 2. Space is needed in the ring buffer for a new block. In this case, the
> block is removed from the xarray, if it hasn't already been forgotten
> (by 1.). Afterwards, the new block is initialized in its place.
>
> This has the nice property that we can reliably track sequence numbers on
> low traffic situations (where they expire based on their timestamp) and
> more quickly forget sequence numbers in high traffic situations before they
> potentially wrap over and repeat before they are expired.
>
> When nodes are merged, the blocks are merged as well. The timestamp of a
> merged block is set to the minimum of the two timestamps to never keep
> around a seen sequence number for too long. The bitmaps are or'd to mark
> all seen sequence numbers as seen.
>
> All of this still happens under seq_out_lock, to prevent concurrent
> access to the blocks.
>
> The KUnit test for the algorithm is updated as well. The updates are done
> in a way to match the original intends pretty closely. Currently, there is
> much knowledge about the actual algorithm baked into the tests (especially
> the expectations) which may need some redesign in the future.
>
> Reported-by: Steffen Lindner <steffen.lindner@...abb.com>
> Fixes: 05fd00e5e7b1 ("net: hsr: Fix PRP duplicate detection")
I'm sorry for nit picking, but it looks like the current quidance is to
avoid fixes tag for this kind of resiliece improving refactors:
https://lore.kernel.org/netdev/20260121171051.039110c3@kernel.org/
> @@ -526,18 +613,21 @@ int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
> */
> int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
> {
> - enum hsr_port_type other_port;
> - enum hsr_port_type rcv_port;
> + u16 sequence_nr, seq_bit, block_idx;
> + struct hsr_seq_block *block;
> struct hsr_node *node;
> - u16 sequence_diff;
> - u16 sequence_exp;
> - u16 sequence_nr;
>
> - /* out-going frames are always in order
> - * and can be checked the same way as for HSR
> - */
> - if (frame->port_rcv->type == HSR_PT_MASTER)
> - return hsr_register_frame_out(port, frame);
> + node = frame->node_src;
> + sequence_nr = frame->sequence_nr;
> +
> + // out-going frames are always in order
Please use /* */ for comments.
/P
Powered by blists - more mailing lists