[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF2d9jgjeky0eMgwFZKHO_RLTBNstH1gCq4hn1FfO=TtrMP1ow@mail.gmail.com>
Date: Mon, 9 Dec 2019 10:41:28 -0800
From: Mahesh Bandewar (महेश बंडेवार)
<maheshb@...gle.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
Cc: Andy Gospodarek <andy@...yhouse.net>,
Veaceslav Falico <vfalico@...il.com>,
David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Mahesh Bandewar <mahesh@...dewar.net>
Subject: Re: [PATCH net] bonding: fix active-backup transition after link failure
On Sat, Dec 7, 2019 at 2:09 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>
> Mahesh Bandewar <maheshb@...gle.com> wrote:
>
> >After the recent fix 1899bb325149 ("bonding: fix state transition
> >issue in link monitoring"), the active-backup mode with miimon
> >initially come-up fine but after a link-failure, both members
> >transition into backup state.
> >
> >Following steps to reproduce the scenario (eth1 and eth2 are the
> >slaves of the bond):
> >
> > ip link set eth1 up
> > ip link set eth2 down
> > sleep 1
> > ip link set eth2 up
> > ip link set eth1 down
> > cat /sys/class/net/eth1/bonding_slave/state
> > cat /sys/class/net/eth2/bonding_slave/state
> >
> >Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring")
> >CC: Jay Vosburgh <jay.vosburgh@...onical.com>
> >Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
> >---
> > drivers/net/bonding/bond_main.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index fcb7c2f7f001..ad9906c102b4 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -2272,9 +2272,6 @@ static void bond_miimon_commit(struct bonding *bond)
> > } else if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
> > /* make it immediately active */
> > bond_set_active_slave(slave);
> >- } else if (slave != primary) {
> >- /* prevent it from being the active one */
> >- bond_set_backup_slave(slave);
>
> How does this fix things? Doesn't bond_select_active_slave() ->
> bond_change_active_slave() set the backup flag correctly via a call to
> bond_set_slave_active_flags() when it sets a slave to be the active
> slave? If this change resolves the problem, I'm not sure how this ever
> worked correctly, even prior to 1899bb325149.
>
Hi Jay, I used kprobes to figure out the brokenness this patch fixes.
Prior to your patch this call would not happen but with the patch,
this extra call will put the master into the backup mode erroneously
(in fact both members would be in backup state). The mechanics you
have mentioned works correctly except that in the prior case, the
switch statement was using new_link which was not same as
link_new_state. The miimon_inspect will update new_link which is what
was used in miimon_commit code. The link_new_state was used only to
mitigate the rtnl-lock issue which would update the "link". Hence in
the prior code, this path would never get executed.
The steps to reproduce this issue is straightforward and happens 100%
of the time (I used two mlx interfaces but that shouldn't matter).
thanks,
--mahesh..
> -J
>
> > }
> >
> > slave_info(bond->dev, slave->dev, "link status definitely up, %u Mbps %s duplex\n",
> >--
> >2.24.0.393.g34dc348eaf-goog
> >
>
> ---
> -Jay Vosburgh, jay.vosburgh@...onical.com
Powered by blists - more mailing lists