[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20200818.155824.2292310502481809055.davem@davemloft.net>
Date: Tue, 18 Aug 2020 15:58:24 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: jwiesner@...e.com
Cc: netdev@...r.kernel.org, j.vosburgh@...il.com, vfalico@...il.com,
andy@...yhouse.net, kuba@...nel.org, Andreas.Taschner@...e.com,
mkubecek@...e.cz
Subject: Re: [PATCH net] bonding: fix active-backup failover for current
ARP slave
From: Jiri Wiesner <jwiesner@...e.com>
Date: Sun, 16 Aug 2020 20:52:44 +0200
> When the ARP monitor is used for link detection, ARP replies are
> validated for all slaves (arp_validate=3) and fail_over_mac is set to
> active, two slaves of an active-backup bond may get stuck in a state
> where both of them are active and pass packets that they receive to
> the bond. This state makes IPv6 duplicate address detection fail. The
> state is reached thus:
> 1. The current active slave goes down because the ARP target
> is not reachable.
> 2. The current ARP slave is chosen and made active.
> 3. A new slave is enslaved. This new slave becomes the current active
> slave and can reach the ARP target.
> As a result, the current ARP slave stays active after the enslave
> action has finished and the log is littered with "PROBE BAD" messages:
>> bond0: PROBE: c_arp ens10 && cas ens11 BAD
> The workaround is to remove the slave with "going back" status from
> the bond and re-enslave it. This issue was encountered when DPDK PMD
> interfaces were being enslaved to an active-backup bond.
>
> I would be possible to fix the issue in bond_enslave() or
> bond_change_active_slave() but the ARP monitor was fixed instead to
> keep most of the actions changing the current ARP slave in the ARP
> monitor code. The current ARP slave is set as inactive and backup
> during the commit phase. A new state, BOND_LINK_FAIL, has been
> introduced for slaves in the context of the ARP monitor. This allows
> administrators to see how slaves are rotated for sending ARP requests
> and attempts are made to find a new active slave.
>
> Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor")
> Signed-off-by: Jiri Wiesner <jwiesner@...e.com>
Applied and queued up for -stable, thanks Jiri.
Powered by blists - more mailing lists