netdev - bonding driver issue when configured for active/backup and using ARP monitoring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <MN2PR03MB47526B686EF6E0F8D81A9397B7F50@MN2PR03MB4752.namprd03.prod.outlook.com>
Date:   Mon, 30 Nov 2020 18:05:23 +0000
From:   "Finer, Howard" <hfiner@...n.com>
To:     "j.vosburgh@...il.com" <j.vosburgh@...il.com>,
        "andy@...yhouse.net" <andy@...yhouse.net>,
        "vfalico@...il.com" <vfalico@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: bonding driver issue when configured for active/backup and using ARP
 monitoring

We use the bonding driver in an active-backup configuration with ARP monitoring. We also use the TIPC protocol which we run over the bond device. We are consistently seeing an issue in both the 3.16 and 4.19 kernels whereby when the bond slave is switched TIPC is being notified of the change rather than it happening silently.  The problem that we see is that when the active slave fails, a NETDEV_CHANGE event is being sent to the TIPC driver to notify it that the link is down. This causes the TIPC driver to reset its bearers and therefore break communication between the nodes that are clustered.
With some additional instrumentation in thee driver, I see this in /var/log/syslog:
<6> 1 2020-11-20T18:14:19.159524+01:00 LABNBS5B kernel - - - [65818.378287] bond0: link status definitely down for interface eth0, disabling it
<6> 1 2020-11-20T18:14:19.159536+01:00 LABNBS5B kernel - - - [65818.378296] bond0: now running without any active interface!
<6> 1 2020-11-20T18:14:19.159537+01:00 LABNBS5B kernel - - - [65818.378304] bond0: bond_activebackup_arp_mon: notify_rtnl, slave state notify/slave link notify
<6> 1 2020-11-20T18:14:19.159538+01:00 LABNBS5B kernel - - - [65818.378835] netdev change bearer <eth:bond0>
<6> 1 2020-11-20T18:14:19.263523+01:00 LABNBS5B kernel - - - [65818.482384] bond0: link status definitely up for interface eth1
<6> 1 2020-11-20T18:14:19.263534+01:00 LABNBS5B kernel - - - [65818.482387] bond0: making interface eth1 the new active one
<6> 1 2020-11-20T18:14:19.263536+01:00 LABNBS5B kernel - - - [65818.482633] bond0: first active interface up!
<6> 1 2020-11-20T18:14:19.263537+01:00 LABNBS5B kernel - - - [65818.482671] netdev change bearer <eth:bond0>
<6> 1 2020-11-20T18:14:19.367523+01:00 LABNBS5B kernel - - - [65818.586228] bond0: bond_activebackup_arp_mon: call_netdevice_notifiers NETDEV_NOTIFY_PEERS

There is no issue when using MII monitoring instead of ARP monitoring since when the slave is detected as down, it immediately switches to the backup as it sees that slave as being up and ready.    But when using ARP monitoring, only one of the slaves is 'up'.  So when the active slave goes down, the bonding driver will see no active slaves until it brings up the backup slave on the next call to bond_activebackup_arp_mon.  Bringing up that backup slave has to be attempted prior to notifying any peers of a change or else they will see the outage.  In this case it seems the should_notify_rtnl flag has to be set to false.    However, I also question if the switch to the backup slave should actually occur immediately like it does for MII and that the backup should be immediately 'brought up/switched to' without having to wait for the next iteration.

static void bond_activebackup_arp_mon(struct bonding *bond)
{
                bool should_notify_peers = false;
                bool should_notify_rtnl = false;
                int delta_in_ticks;

                delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);

                if (!bond_has_slaves(bond))
                                goto re_arm;

                rcu_read_lock();

                should_notify_peers = bond_should_notify_peers(bond);

                if (bond_ab_arp_inspect(bond)) {
                                rcu_read_unlock();

                                /* Race avoidance with bond_close flush of workqueue */
                                if (!rtnl_trylock()) {
                                                delta_in_ticks = 1;
                                                should_notify_peers = false;
                                                goto re_arm;
                                }

                                bond_ab_arp_commit(bond);

                                rtnl_unlock();
                                rcu_read_lock();
                }

                should_notify_rtnl = bond_ab_arp_probe(bond);
                rcu_read_unlock();

re_arm:
                if (bond->params.arp_interval)
                                queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);

                if (should_notify_peers || should_notify_rtnl) {
                                if (!rtnl_trylock())
                                                return;

                                if (should_notify_peers)
        {
            netdev_info(bond->dev, "bond_activebackup_arp_mon: call_netdevice_notifiers NETDEV_NOTIFY_PEERS\n");

                                                call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
                                                                                                bond->dev);
        }
                                if (should_notify_rtnl) {

            netdev_info(bond->dev, "bond_activebackup_arp_mon: notify_rtnl, slave state notify/slave link notify\n");
                                                bond_slave_state_notify(bond);
                                                bond_slave_link_notify(bond);
                                }

                                rtnl_unlock();
                }
}

As it currently behaves there is no way to run TIPC over an active-backup ARP-monitored bond device.  I suspect there are other situations/uses that would likewise have an issue with the 'erroneous' NETDEV_CHANGE being issued.   Since TIPC (and others) have no idea what the dev is, it is not possible to ignore the event nor should it be ignored.  It therefore seems the event shouldn't be sent for this situation.   Please confirm the analysis above and provide a path forward since as currently implemented the functionality is broken.

Thanks,
Howard Finer
hfiner@...n.com