netdev - Re: [net-next] tipc: fix missing Name entries due to half-failover

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20190504.012006.508656003426228400.davem@davemloft.net>
Date:   Sat, 04 May 2019 01:20:06 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     tuong.t.lien@...tech.com.au
Cc:     jon.maloy@...csson.com, maloy@...jonn.com, ying.xue@...driver.com,
        netdev@...r.kernel.org, tipc-discussion@...ts.sourceforge.net
Subject: Re: [net-next] tipc: fix missing Name entries due to half-failover

From: Tuong Lien <tuong.t.lien@...tech.com.au>
Date: Thu,  2 May 2019 17:23:23 +0700

> TIPC link can temporarily fall into "half-establish" that only one of
> the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL
> messages, whereas the other link endpoint is not up (e.g. immediately
> when the endpoint receives ACTIVATE_MSG, the network interface goes
> down...).
> 
> This is a normal situation and will be settled because the link
> endpoint will be eventually brought down after the link tolerance time.
> 
> However, the situation will become worse when the second link is
> established before the first link endpoint goes down,
> For example:
> 
>    1. Both links <1A-2A>, <1B-2B> down
>    2. Link endpoint 2A up, but 1A still down (e.g. due to network
>       disturbance, wrong session, etc.)
>    3. Link <1B-2B> up
>    4. Link endpoint 2A down (e.g. due to link tolerance timeout)
>    5. Node B starts failover onto link <1B-2B>
> 
>    ==> Node A does never start link failover.
> 
> When the "half-failover" situation happens, two consequences have been
> observed:
> 
> a) Peer link/node gets stuck in FAILINGOVER state;
> b) Traffic or user messages that peer node is trying to failover onto
> the second link can be partially or completely dropped by this node.
> 
> The consequence a) was actually solved by commit c140eb166d68 ("tipc:
> fix failover problem"), but that commit didn't cover the b). It's due
> to the fact that the tunnel link endpoint has never been prepared for a
> failover, so the 'l->drop_point' (and the other data...) is not set
> correctly. When a TUNNEL_MSG from peer node arrives on the link,
> depending on the inner message's seqno and the current 'l->drop_point'
> value, the message can be dropped (- treated as a duplicate message) or
> processed.
> At this early stage, the traffic messages from peer are likely to be
> NAME_DISTRIBUTORs, this means some name table entries will be missed on
> the node forever!
> 
> The commit resolves the issue by starting the FAILOVER process on this
> node as well. Another benefit from this solution is that we ensure the
> link will not be re-established until the failover ends.
> 
> Acked-by: Jon Maloy <jon.maloy@...csson.com>
> Signed-off-by: Tuong Lien <tuong.t.lien@...tech.com.au>

Applied, thank you.