[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20180929.114645.1219366490102910355.davem@davemloft.net>
Date: Sat, 29 Sep 2018 11:46:45 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: jon.maloy@...csson.com
Cc: netdev@...r.kernel.org, gordan.mihaljevic@...tech.com.au,
tung.q.nguyen@...tech.com.au, hoang.h.le@...tech.com.au,
canh.d.luu@...tech.com.au, ying.xue@...driver.com,
tipc-discussion@...ts.sourceforge.net
Subject: Re: [net 1/1] tipc: fix failover problem
From: Jon Maloy <jon.maloy@...csson.com>
Date: Wed, 26 Sep 2018 21:00:54 +0200
> From: LUU Duc Canh <canh.d.luu@...tech.com.au>
>
> We see the following scenario:
> 1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
> Since there is a second working link, failover procedure is started.
> 2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
> A on node 2. The node item 1->2 goes to state FAILINGOVER.
> 3) Linke endpoint A/2 receives the failover, and is supposed to take
> down its parallell link endpoint B/2, while producing a FAILOVER
> message to send back to A/1.
> 4) However, B/2 has already been deleted, so no FAILOVER message can
> created.
> 5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
> any messages that can bring B/1 up again. We are left with a non-
> redundant link between node 1 and 2.
>
> We fix this with letting endpoint A/2 build a dummy FAILOVER message
> to send to back to A/1, so that the situation can be resolved.
>
> Signed-off-by: LUU Duc Canh <canh.d.luu@...tech.com.au>
> Signed-off-by: Jon Maloy <jon.maloy@...csson.com>
Applied.
Powered by blists - more mailing lists