lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABAyFk4dzN_Oprod1LM_X1x+-bHg8HxnJ1OzhLRD092mY4ON3w@mail.gmail.com>
Date: Fri, 13 Sep 2024 08:06:06 +0530
From: suresh ks <suresh2514@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: andy@...yhouse.net, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: bonding: do not set force_primary if reselect is set
 to failure

Hi,

Thanks  a lot for reviewing.

I was working for a customer issue where they removed a primary NIC for switch
maintenance and when added back, it caused iscsi storage outage because they
did not expect the bod to do a failover.

So bonding is behaving as default. But then I thought maybe we can do
something to cater for such scenarios and came up with this idea.  But
I agree my
testing was not a failure as I see ""Link Failure Count: 0" there.  I
used the below
command from my kvm host to simulate a link down and up.

     virsh  detach-interface testvm1  --type network --mac 52:54:00:d7:a7:2a

and attached it back with:

    virsh  attach-interface testvm1 --type network --source default
--mac 52:54:00:d7:a7:2a
     --model e1000e --config --live

So what would be the best solution here if I want to take out a
primary NIC for maintenance,
and then add it back ?.  I was also  trying with 'ifenslave'  to first
make secondary NIC active
and then remove primary NIC.

   ifenslave -d bond0 enp1s0

The interface changed to 'down', but immediately it came back up and
became active again.
I don't know why. The journal logs suggest my NetworkManager is
autoactivating it again :)

Thanks a lot for your time again.

- Suresh

On Fri, Sep 13, 2024 at 5:36 AM Jay Vosburgh <jv@...sburgh.net> wrote:
>
> Suresh Kumar <suresh2514@...il.com> wrote:
>
> >when bond_enslave() is called, it sets bond->force_primary to true
> >without checking if primary_reselect is set to 'failure' or 'better'.
> >This can result in primary becoming active again when link is back which
> >is not what we want when primary_reselect is set to 'failure'
>
>         The current behavior is by design, and is documented in
> Documentation/networking/bonding.rst:
>
>
>         The primary_reselect setting is ignored in two cases:
>
>                 If no slaves are active, the first slave to recover is
>                 made the active slave.
>
>                 When initially enslaved, the primary slave is always made
>                 the active slave.
>
>
>         Your proposed change would cause the primary to never be made
> the active interface when added to the bond for the primary_reselect
> "better" and "failure" settings, unless the primary interface is added
> to the bond first or all other interfaces are down.
>
>         Also, your description above and the test example below use the
> phrases "link is back" and "primary link failure" but the patch and test
> context suggest that the primary interface is being removed from the
> bond and then later added back to the bond, which is not the same thing
> as a link failure.
>
>         -J
>
> >Test
> >====
> >Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> >
> >Bonding Mode: fault-tolerance (active-backup)
> >Primary Slave: enp1s0 (primary_reselect failure)
> >Currently Active Slave: enp1s0
> >MII Status: up
> >MII Polling Interval (ms): 100
> >Up Delay (ms): 0
> >Down Delay (ms): 0
> >Peer Notification Delay (ms): 0
> >
> >Slave Interface: enp1s0
> >MII Status: up
> >Speed: 1000 Mbps
> >Duplex: full
> >Link Failure Count: 0
> >Permanent HW addr: 52:54:00:d7:a7:2a
> >Slave queue ID: 0
> >
> >Slave Interface: enp9s0
> >MII Status: up
> >Speed: 1000 Mbps
> >Duplex: full
> >Link Failure Count: 0
> >Permanent HW addr: 52:54:00:da:9a:f9
> >Slave queue ID: 0
> >
> >
> >After primary link failure:
> >
> >Bonding Mode: fault-tolerance (active-backup)
> >Primary Slave: None
> >Currently Active Slave: enp9s0 <---- secondary is active now
> >MII Status: up
> >MII Polling Interval (ms): 100
> >Up Delay (ms): 0
> >Down Delay (ms): 0
> >Peer Notification Delay (ms): 0
> >
> >Slave Interface: enp9s0
> >MII Status: up
> >Speed: 1000 Mbps
> >Duplex: full
> >Link Failure Count: 0
> >Permanent HW addr: 52:54:00:da:9a:f9
> >Slave queue ID: 0
> >
> >
> >Now add primary link back and check bond status:
> >
> >Bonding Mode: fault-tolerance (active-backup)
> >Primary Slave: enp1s0 (primary_reselect failure)
> >Currently Active Slave: enp1s0  <------------- primary is active again
> >MII Status: up
> >MII Polling Interval (ms): 100
> >Up Delay (ms): 0
> >Down Delay (ms): 0
> >Peer Notification Delay (ms): 0
> >
> >Slave Interface: enp9s0
> >MII Status: up
> >Speed: 1000 Mbps
> >Duplex: full
> >Link Failure Count: 0
> >Permanent HW addr: 52:54:00:da:9a:f9
> >Slave queue ID: 0
> >
> >Slave Interface: enp1s0
> >MII Status: up
> >Speed: 1000 Mbps
> >Duplex: full
> >Link Failure Count: 0
> >Permanent HW addr: 52:54:00:d7:a7:2a
> >Slave queue ID: 0
> >
> >Signed-off-by: Suresh Kumar <suresh2514@...il.com>
> >---
> > drivers/net/bonding/bond_main.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index bb9c3d6ef435..731256fbb996 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -2146,7 +2146,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> >               /* if there is a primary slave, remember it */
> >               if (strcmp(bond->params.primary, new_slave->dev->name) == 0) {
> >                       rcu_assign_pointer(bond->primary_slave, new_slave);
> >-                      bond->force_primary = true;
> >+            if (bond->params.primary_reselect != BOND_PRI_RESELECT_FAILURE  &&
> >+                bond->params.primary_reselect != BOND_PRI_RESELECT_BETTER)
> >+                          bond->force_primary = true;
> >               }
> >       }
> >
> >--
> >2.43.0
> >
>
> ---
>         -Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ