netdev - Re: [PATCH net-next v3 4/8] hsr: Implement more robust duplicate discard for PRP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccad3f52-e46a-46b5-9113-153a31b3ddd3@redhat.com>
Date: Thu, 5 Feb 2026 13:25:01 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Felix Maurer <fmaurer@...hat.com>, netdev@...r.kernel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 horms@...nel.org, jkarrenpalo@...il.com, tglx@...nel.org, mingo@...nel.org,
 bigeasy@...utronix.de, matttbe@...nel.org, allison.henderson@...cle.com,
 petrm@...dia.com, antonio@...nvpn.net,
 Steffen Lindner <steffen.lindner@...abb.com>
Subject: Re: [PATCH net-next v3 4/8] hsr: Implement more robust duplicate
 discard for PRP

On 2/2/26 3:19 PM, Felix Maurer wrote:
> The PRP duplicate discard algorithm does not work reliably with certain
> link faults. Especially with packet loss on one link, the duplicate discard
> algorithm drops valid packets which leads to packet loss on the PRP
> interface where the link fault should in theory be perfectly recoverable by
> PRP. This happens because the algorithm opens the drop window on the lossy
> link, covering received and lost sequence numbers. If the other, non-lossy
> link receives the duplicate for a lost frame, it is within the drop window
> of the lossy link and therefore dropped.
> 
> Since IEC 62439-3:2012, a node has one sequence number counter for frames
> it sends, instead of one sequence number counter for each destination.
> Therefore, a node can not expect to receive contiguous sequence numbers
> from a sender. A missing sequence number can be totally normal (if the
> sender intermittently communicates with another node) or mean a frame was
> lost.
> 
> The algorithm, as previously implemented in commit 05fd00e5e7b1 ("net: hsr:
> Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0)
> but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no
> algorithm is specified but up to implementers. It should be "designed such
> that it never rejects a legitimate frame, while occasional acceptance of a
> duplicate can be tolerated" (IEC 62439-3:2021).
> 
> For the duplicate discard algorithm, this means that 1) we need to track
> the sequence numbers individually to account for non-contiguous sequence
> numbers, and 2) we should always err on the side of accepting a duplicate
> than dropping a valid frame.
> 
> The idea of the new algorithm is to store the seen sequence numbers in a
> bitmap. To keep the size of the bitmap in control, we store it as a "sparse
> bitmap" where the bitmap is split into blocks and not all blocks exist at
> the same time. The sparse bitmap is implemented using an xarray that keeps
> the references to the individual blocks and a backing ring buffer that
> stores the actual blocks. New blocks are initialized in the buffer and
> added to the xarray as needed when new frames arrive. Existing blocks are
> removed in two conditions:
> 1. The block found for an arriving sequence number is old and therefore not
>    relevant to the duplicate discard algorithm anymore, i.e., it has been
>    added more than the entry forget time ago. In this case, the block is
>    removed from the xarray and marked as forgotten (by setting its
>    timestamp to 0).
> 2. Space is needed in the ring buffer for a new block. In this case, the
>    block is removed from the xarray, if it hasn't already been forgotten
>    (by 1.). Afterwards, the new block is initialized in its place.
> 
> This has the nice property that we can reliably track sequence numbers on
> low traffic situations (where they expire based on their timestamp) and
> more quickly forget sequence numbers in high traffic situations before they
> potentially wrap over and repeat before they are expired.
> 
> When nodes are merged, the blocks are merged as well. The timestamp of a
> merged block is set to the minimum of the two timestamps to never keep
> around a seen sequence number for too long. The bitmaps are or'd to mark
> all seen sequence numbers as seen.
> 
> All of this still happens under seq_out_lock, to prevent concurrent
> access to the blocks.
> 
> The KUnit test for the algorithm is updated as well. The updates are done
> in a way to match the original intends pretty closely. Currently, there is
> much knowledge about the actual algorithm baked into the tests (especially
> the expectations) which may need some redesign in the future.
> 
> Reported-by: Steffen Lindner <steffen.lindner@...abb.com>
> Fixes: 05fd00e5e7b1 ("net: hsr: Fix PRP duplicate detection")

I'm sorry for nit picking, but it looks like the current quidance is to
avoid fixes tag for this kind of resiliece improving refactors:

https://lore.kernel.org/netdev/20260121171051.039110c3@kernel.org/

> @@ -526,18 +613,21 @@ int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
>   */
>  int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
>  {
> -	enum hsr_port_type other_port;
> -	enum hsr_port_type rcv_port;
> +	u16 sequence_nr, seq_bit, block_idx;
> +	struct hsr_seq_block *block;
>  	struct hsr_node *node;
> -	u16 sequence_diff;
> -	u16 sequence_exp;
> -	u16 sequence_nr;
>  
> -	/* out-going frames are always in order
> -	 * and can be checked the same way as for HSR
> -	 */
> -	if (frame->port_rcv->type == HSR_PT_MASTER)
> -		return hsr_register_frame_out(port, frame);
> +	node = frame->node_src;
> +	sequence_nr = frame->sequence_nr;
> +
> +	// out-going frames are always in order

Please use /* */ for comments.

/P