lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Dec 2019 22:38:06 -0800
From:   Jay Vosburgh <jay.vosburgh@...onical.com>
To:     =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= 
        <maheshb@...gle.com>
cc:     Andy Gospodarek <andy@...yhouse.net>,
        Veaceslav Falico <vfalico@...il.com>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Mahesh Bandewar <mahesh@...dewar.net>
Subject: Re: [PATCH net] bonding: fix active-backup transition after link failure

Mahesh Bandewar (महेश बंडेवार) wrote:

>On Sat, Dec 7, 2019 at 2:09 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>>
>> Mahesh Bandewar <maheshb@...gle.com> wrote:
>>
>> >After the recent fix 1899bb325149 ("bonding: fix state transition
>> >issue in link monitoring"), the active-backup mode with miimon
>> >initially come-up fine but after a link-failure, both members
>> >transition into backup state.
>> >
>> >Following steps to reproduce the scenario (eth1 and eth2 are the
>> >slaves of the bond):
>> >
>> >    ip link set eth1 up
>> >    ip link set eth2 down
>> >    sleep 1
>> >    ip link set eth2 up
>> >    ip link set eth1 down
>> >    cat /sys/class/net/eth1/bonding_slave/state
>> >    cat /sys/class/net/eth2/bonding_slave/state
>> >
>> >Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring")
>> >CC: Jay Vosburgh <jay.vosburgh@...onical.com>
>> >Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
>> >---
>> > drivers/net/bonding/bond_main.c | 3 ---
>> > 1 file changed, 3 deletions(-)
>> >
>> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> >index fcb7c2f7f001..ad9906c102b4 100644
>> >--- a/drivers/net/bonding/bond_main.c
>> >+++ b/drivers/net/bonding/bond_main.c
>> >@@ -2272,9 +2272,6 @@ static void bond_miimon_commit(struct bonding *bond)
>> >                       } else if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>> >                               /* make it immediately active */
>> >                               bond_set_active_slave(slave);
>> >-                      } else if (slave != primary) {
>> >-                              /* prevent it from being the active one */
>> >-                              bond_set_backup_slave(slave);
>>
>>         How does this fix things?  Doesn't bond_select_active_slave() ->
>> bond_change_active_slave() set the backup flag correctly via a call to
>> bond_set_slave_active_flags() when it sets a slave to be the active
>> slave?  If this change resolves the problem, I'm not sure how this ever
>> worked correctly, even prior to 1899bb325149.
>>
>Hi Jay, I used kprobes to figure out the brokenness this patch fixes.
>Prior to your patch this call would not happen but with the patch,
>this extra call will put the master into the backup mode erroneously
>(in fact both members would be in backup state). The mechanics you
>have mentioned works correctly except that in the prior case, the
>switch statement was using new_link which was not same as
>link_new_state. The miimon_inspect will update new_link which is what
>was used in miimon_commit code. The link_new_state was used only to
>mitigate the rtnl-lock issue which would update the "link". Hence in
>the prior code, this path would never get executed.

	I'm looking at the old code (prior to 1899bb325149), and I don't
see a path to what you're describing for the down to up transition in
active-backup mode.

bond_miimon_inspect enters switch, slave->link == BOND_LINK_DOWN.

link_state is nonzero, call bond_propose_link_state(BOND_LINK_BACK),
which sets slave->link_new_state to _BACK.

Fall through to BOND_LINK_BACK case, set slave->new_link = BOND_LINK_UP

bond_mii_monitor then calls bond_commit_link_state, which sets
slave->link to BOND_LINK_BACK

Enter bond_miimon_commit switch (new_link), which is BOND_LINK_UP

In "case BOND_LINK_UP:" there is no way out of this block, and it should
proceed to call bond_set_backup_slave for active-backup mode every time.

>The steps to reproduce this issue is straightforward and happens 100%
>of the time (I used two mlx interfaces but that shouldn't matter).

	Yes, I've been able to reproduce it locally (with igb, FWIW).  I
think the patch is likely ok, I'm just mystified as to how the backup
setting could have worked prior to 1899bb325149, so perhaps the Fixes
tag doesn't go back far enough.

	-J

>thanks,
>--mahesh..
>>         -J
>>
>> >                       }
>> >
>> >                       slave_info(bond->dev, slave->dev, "link status definitely up, %u Mbps %s duplex\n",
>> >--
>> >2.24.0.393.g34dc348eaf-goog

---
	-Jay Vosburgh, jay.vosburgh@...onical.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ