lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF2d9jgJ7NWQaUAbB-hOhCk7BSwj+ApwWZLDVG3_-zhqCgnvzw@mail.gmail.com>
Date:   Thu, 27 Jan 2022 17:26:36 -0800
From:   Mahesh Bandewar (महेश बंडेवार) 
        <maheshb@...gle.com>
To:     Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        Andy Gospodarek <andy@...yhouse.net>,
        Veaceslav Falico <vfalico@...il.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Mahesh Bandewar <mahesh@...dewar.net>
Subject: Re: [PATCH next] bonding: pair enable_port with slave_arr_updates

On Thu, Jan 27, 2022 at 3:54 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>
> Mahesh Bandewar <maheshb@...gle.com> wrote:
>
> >When 803.2ad mode enables a participating port, it should update
> >the slave-array. I have observed that the member links are participating
> >and are part of the active aggregator while the traffic is egressing via
> >only one member link (in a case where two links are participating). Via
> >krpobes I discovered that that slave-arr has only one link added while
> >the other participating link wasn't part of the slave-arr.
> >
> >I couldn't see what caused that situation but the simple code-walk
> >through provided me hints that the enable_port wasn't always associated
> >with the slave-array update.
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index 6006c2e8fa2b..f20bbc18a03f 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> >
> >                                       __enable_port(port);
> >                               }
> >+                              /* Slave array needs update */
> >+                              *update_slave_arr = true;
> >                       }
>
>         Shouldn't this be in the same block as the __enable_port() call?
absolutely! It's inefficient to have outside of that if-clause and
would unnecessarily update slave-arr when it's not even needed. My
bad, I'll fix it in v2

> If I'm reading the code correctly, as written this will trigger an
> update of the array on every pass of the state machine (every 100ms) if
> any port is in COLLECTING_DISTRIBUTING state, which is the usual case.
>
> >                       break;
> >               default:
> >@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> >                            port = port->next_port_in_aggregator) {
> >                               __enable_port(port);
> >                       }
> >+                      /* Slave array needs update. */
> >+                      *update_slave_arr = true;
> >               }
>
>         I suspect this change would only affect your issue if the port
> in question was failing to partner (i.e., the peer wasn't running LACP
> or there was some failure in the LACP negotiation).  If the ports in
> your test were in the same aggregator, that shouldn't be the case, as I
> believe unpartnered ports are always individual (not in an aggregator).
The condition seems to manifest randomly on some machines and not
always. All links are part of the same aggregator but some transient
situation does break the bond and almost always it reforms but
occasionally it gets into this state I mentioned.

My primary motive behind this fix/patch is to update the slave-arr
when LACP state is changing (for whatever reasons). Enabling port
seems to be an event which must be associated with updating the array
and found these two locations in the code where there is a chance that
update_array may not happen when a port gets enabled.

>
>         Do you have a test?
>
I'm not sure what triggers it and hence I don't have the exact repro
steps / test.

>         -J
>
> >       }
> >
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
>         -Jay Vosburgh, jay.vosburgh@...onical.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ