[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54C2D39B.7070104@cumulusnetworks.com>
Date: Fri, 23 Jan 2015 18:04:59 -0500
From: Jonathan Toppins <jtoppins@...ulusnetworks.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: netdev@...r.kernel.org, Scott Feldman <sfeldma@...il.com>,
Andy Gospodarek <gospo@...ulusnetworks.com>,
Veaceslav Falico <vfalico@...il.com>,
Nikolay Aleksandrov <nikolay@...hat.com>
Subject: Re: [PATCH net-next 1/5] bonding: keep bond interface carrier off
until at least one active member
On 1/21/15 2:14 AM, Jay Vosburgh wrote:
> Jonathan Toppins <jtoppins@...ulusnetworks.com> wrote:
>
>> On 1/19/15 4:16 PM, Jay Vosburgh wrote:
>>> Jonathan Toppins <jtoppins@...ulusnetworks.com> wrote:
>>>
>>>> From: Scott Feldman <sfeldma@...ulusnetworks.com>
>>>>
>>>> Bonding driver parameter min_links is now used to signal upper-level
>>>> protocols of bond status. The way it works is if the total number of
>>>> active members in slaves drops below min_links, the bond link carrier
>>>> will go down, signaling upper levels that bond is inactive. When active
>>>> members returns to >= min_links, bond link carrier will go up (RUNNING),
>>>> and protocols can resume. When bond is carrier down, member ports are
>>>> in stp fwd state blocked (rather than normal disabled state), so
>>>> low-level ctrl protocols (LACP) can still get in and be processed by
>>>> bonding driver.
>>>
>>> Presuming that "stp" is Spanning Tree, is the last sentence
>>> above actually describing the behavior of a bridge port when a bond is
>>> the member of the bridge? I'm not sure I understand what "member ports"
>>> refers to (bridge ports or bonding slaves).
>>
>> Ack, maybe replacing the last sentence with something like:
>> When bond is carrier down, the slave ports are only forwarding
>> low-level control protocols (e.g. LACP PDU) and discarding all other
>> packets.
>
> Ah, are you actually referring to the fact that slaves that are
> up will still deliver packets to listeners that bind directly to the
> slave or hook in through a rx_handler? This is, in part, the
> "RX_HANDLER_EXACT" business in bond_handle_frame and
> __netif_receive_skb_core.
>
> The decision for that has nothing to do with the protocol; I
> seem to recall that FCoE (or maybe it's iSCSI) does its regular traffic
> reception this way (although via dev_add_pack, not an rx_handler) so it
> can run traffic regardless of the bonding master's state.
I see, it seems you are basically saying; the slaves are up but when the
logical bond interface is carrier down there was no code changed in
bond_handle_frame() to actually drop frames other than LACPDUs. So
basically having this statement makes no sense until there is code to
actually drop those additional frames.
>
>>>> @@ -2381,10 +2386,15 @@ int bond_3ad_set_carrier(struct bonding *bond)
>>>> ret = 0;
>>>> goto out;
>>>> }
>>>> +
>>>> + bond_for_each_slave_rcu(bond, slave, iter)
>>>> + if (SLAVE_AD_INFO(slave)->aggregator.is_active)
>>>> + active_slaves++;
>>>> +
>>>> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
>>>> - if (active) {
>>>> + if (active && __agg_has_partner(active)) {
>>>
>>> Why "__agg_has_partner"? Since the "else" of this clause is:
>>>
>>> } else if (netif_carrier_ok(bond->dev)) {
>>> netif_carrier_off(bond->dev);
>>> }
>>>
>>> I'm wondering if this will do the right thing for the case that
>>> there are no LACP partners at all (e.g., the switch ports do not have
>>> LACP enabled), in which case the active aggregator should be a single
>>> "individual" port as a fallback, but will not have a partner.
>>>
>>> -J
>>>
>>
>> I see your point. The initial thinking was the logical bond carrier should
>> not be brought up until the bond has a partner and is ready to pass
>> traffic, otherwise we start blackholing frames. Looking over the code it
>> seems the aggregator.is_individual flag is only set to true when a slave
>> is in half-duplex, this seems odd?
>
> The agg.is_individual flag and an "individual" aggregator are
> subtly different things.
>
> The is_individual flag is part of the implementation of the
> standard requirement that half duplex ports are not allowed to enable
> LACP (and thus cannot aggregate, and end up as "Individual" in
> standard-ese). The standard capitalizes "Individual" when it describes
> the cannot-aggregate property of a port (note that half duplex is only
> one reason of many for a port being Individual).
>
> An "individual" aggregator (my usage of 802.1AX 5.3.6 (b)) is an
> aggregator containing exactly one port that is Individual. A port can
> end up as Individual (for purposes of this discussion) either through
> the is_individual business, or because the bonding port does run LACP,
> but the link partner does not, and thus no LACPDUs are ever received.
>
> For either of the above cases (is_individual or no-LACP-parter),
> then the active aggregator will be an "individual" aggregator, but will
> not have a parter (__agg_has_partner() will be false). The standard has
> a bunch of verbiage about this in 802.1AX 5.3.5 - 5.3.9.
>
>> My initial thinking to alleviate the concern is something like the
>> following:
>>
>> if (active && !SLAVE_AD_INFO(slave)->aggregator.is_individual &&
>> __agg_has_partner(active)) {
>> /* set carrier based on min_links */
>> } else if (active && SLAVE_AD_INFO(slave)->aggregator.is_individual) {
>> /* set bond carrier state according to carrier state of slave */
>> } else if (netif_carrier_ok(bond->dev)) {
>> netif_carrier_off(bond->dev);
>> }
>
> I'm not sure you need to care about is_individual or
> __agg_has_partner at all. If either of those conditions is true for the
> active aggregator, it will contain exactly one port, and so if min_links
> is 2, you'll have carrier off, and if min_links is 1 or less you'll have
> carrier on.
>
> If I'm reading the patch right, the real point (which isn't
> really described very well in the change log) is that you're changing
> the carrier decision to count only active ports in the active
> aggregator, not the total number of ports as is currently done.
>
> I'm not sure why this change is needed:
>
> @@ -2381,10 +2386,15 @@ int bond_3ad_set_carrier(struct bonding *bond)
> ret = 0;
> goto out;
> }
> +
> + bond_for_each_slave_rcu(bond, slave, iter)
> + if (SLAVE_AD_INFO(slave)->aggregator.is_active)
> + active_slaves++;
> +
> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> - if (active) {
> + if (active && __agg_has_partner(active)) {
> /* are enough slaves available to consider link up? */
> - if (active->num_of_ports < bond->params.min_links) {
> + if (active_slaves < bond->params.min_links) {
> if (netif_carrier_ok(bond->dev)) {
> netif_carrier_off(bond->dev);
> goto out;
>
> because a port (slave) that loses carrier or whose link partner
> becomes unresponsive to LACPDUs will be removed from the aggregator. As
> I recall, there are no "inactive" ports in an aggregator; all of them
> have to match in terms of capabilities.
>
> In other words, I'm unsure of when the count of is_active ports
> will not match active->num_of_ports.
>
> Also, the other parts of the patch add some extra updates to the
> carrier state when a port is enabled or disabled, e.g.,
>
> @@ -189,6 +189,7 @@ static inline int __agg_has_partner(struct aggregator *agg)
> static inline void __disable_port(struct port *port)
> {
> bond_set_slave_inactive_flags(port->slave, BOND_SLAVE_NOTIFY_LATER);
> + bond_3ad_set_carrier(port->slave->bond);
> }
>
> Again, I'm not sure why this is necessary, as the cases that
> disable or enable a port will eventually call bond_3ad_set_carrier. For
> example, ad_agg_selection_logic will, when changing active aggregator,
> individually disable all ports of the old active and then may
> individually enable ports of the new active if necessary, and then
> finally call bond_3ad_set_carrier.
>
> In what situations is the patch's behavior an improvement (i.e.,
> is there a situation I'm missing that doesn't do it right)?
I think the addition of bond_3ad_set_carrier() to both __enable_port()
and __disable_port() were optimizations so the bond carrier transition
would happen faster, though I am not certain.
>
> The last portion of the patch:
>
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -1181,6 +1181,7 @@ static int bond_option_min_links_set(struct bonding *bond,
> netdev_info(bond->dev, "Setting min links value to %llu\n",
> newval->value);
> bond->params.min_links = newval->value;
> + bond_set_carrier(bond);
>
> return 0;
> }
>
> does seem to fix a legitimate bug, in that when min_links is
> changed, it does not take effect in real time.
>
>> Maybe I am missing something and there is a simpler option.
>>
>> Thinking about how to validate this, it seems having a bond with two
>> slaves and both slaves in half-duplex will force an aggregator that is
>> individual to be selected.
>>
>> Thoughts?
>
> That's one way, yes. You'll also get an "individual" aggregator
> if none of the link partners enable LACP.
>
It seems it might be better to drop the changes to __enable/disable_port
and bond_3ad_set_carrier from this patch until more testing can be done
from me, especially if you agree the other changes in this series are of
benefit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists