[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11291f9b05764307b660049e2290dd10@EXCH-SVR2013.eberle.local>
Date: Mon, 15 Feb 2021 12:29:48 +0000
From: "Wenzel, Marco" <Marco.Wenzel@...berle.de>
To: George McCollister <george.mccollister@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: AW: HSR/PRP sequence counter issue with Cisco Redbox
> On Wed, Jan 27, 2021 at 6:32 AM Wenzel, Marco <Marco.Wenzel@a-
> eberle.de> wrote:
> >
> > Hi,
> >
> > we have figured out an issue with the current PRP driver when trying to
> communicate with Cisco IE 2000 industrial Ethernet switches in Redbox
> mode. The Cisco always resets the HSR/PRP sequence counter to "1" at low
> traffic (<= 1 frame in 400 ms). It can be reproduced by a simple ICMP echo
> request with 1 s interval between a Linux box running with PRP and a VDAN
> behind the Cisco Redbox. The Linux box then always receives frames with
> sequence counter "1" and drops them. The behavior is not configurable at
> the Cisco Redbox.
> >
> > I fixed it by ignoring sequence counters with value "1" at the sequence
> counter check in hsr_register_frame_out ():
> >
> > diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c index
> > 5c97de459905..630c238e81f0 100644
> > --- a/net/hsr/hsr_framereg.c
> > +++ b/net/hsr/hsr_framereg.c
> > @@ -411,7 +411,7 @@ void hsr_register_frame_in(struct hsr_node *node,
> > struct hsr_port *port, int hsr_register_frame_out(struct hsr_port *port,
> struct hsr_node *node,
> > u16 sequence_nr) {
> > - if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]))
> > + if (seq_nr_before_or_eq(sequence_nr,
> > + node->seq_out[port->type]) && (sequence_nr != 1))
> > return 1;
> >
> > node->seq_out[port->type] = sequence_nr;
> >
> >
> > Do you think this could be a solution? Should this patch be officially applied
> in order to avoid other users running into these communication issues?
>
> This isn't the correct way to solve the problem. IEC 62439-3 defines
> EntryForgetTime as "Time after which an entry is removed from the duplicate
> table" with a value of 400ms and states devices should usually be configured
> to keep entries in the table for a much shorter time. hsr_framereg.c needs to
> be reworked to handle this according to the specification.
Sorry for the delay but I did not have the time to take a closer look at the problem until now.
My suggestion for the EntryForgetTime feature would be the following: A time_out element will be added to the hsr_node structure, which always stores the current time when entering hsr_register_frame_out(). If the last stored time is older than EntryForgetTime (400 ms) the sequence number check will be ignored.
diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 5c97de459905..a97bffbd2581 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -164,8 +164,10 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
* as initialization. (0 could trigger an spurious ring error warning).
*/
now = jiffies;
- for (i = 0; i < HSR_PT_PORTS; i++)
+ for (i = 0; i < HSR_PT_PORTS; i++) {
new_node->time_in[i] = now;
+ new_node->time_out[i] = now;
+ }
for (i = 0; i < HSR_PT_PORTS; i++)
new_node->seq_out[i] = seq_out;
@@ -411,9 +413,12 @@ void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
u16 sequence_nr)
{
- if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]))
+ if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) &&
+ time_is_after_jiffies(node->time_out[port->type] + msecs_to_jiffies(HSR_ENTRY_FORGET_TIME))) {
return 1;
+ }
+ node->time_out[port->type] = jiffies;
node->seq_out[port->type] = sequence_nr;
return 0;
}
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
index 86b43f539f2c..d9628e7a5f05 100644
--- a/net/hsr/hsr_framereg.h
+++ b/net/hsr/hsr_framereg.h
@@ -75,6 +75,7 @@ struct hsr_node {
enum hsr_port_type addr_B_port;
unsigned long time_in[HSR_PT_PORTS];
bool time_in_stale[HSR_PT_PORTS];
+ unsigned long time_out[HSR_PT_PORTS];
/* if the node is a SAN */
bool san_a;
bool san_b;
diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h
index 7dc92ce5a134..f79ca55d6986 100644
--- a/net/hsr/hsr_main.h
+++ b/net/hsr/hsr_main.h
@@ -21,6 +21,7 @@
#define HSR_LIFE_CHECK_INTERVAL 2000 /* ms */
#define HSR_NODE_FORGET_TIME 60000 /* ms */
#define HSR_ANNOUNCE_INTERVAL 100 /* ms */
+#define HSR_ENTRY_FORGET_TIME 400 /* ms */
/* By how much may slave1 and slave2 timestamps of latest received frame from
* each node differ before we notify of communication problem?
This approach works fine with the Cisco IE 2000 and I think it implements the correct way to handle sequence numbers as defined in IEC 62439-3.
Regards,
Marco Wenzel
> >
> > Thanks
> > Marco Wenzel
>
> Regards,
> George McCollister
Powered by blists - more mailing lists