[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAF2d9jhRn6V5feSYiej9OcTsqhpLWnYVEi2ZvwHnyYH2gyWg=g@mail.gmail.com>
Date: Mon, 15 Sep 2014 11:05:13 -0700
From: Mahesh Bandewar <maheshb@...gle.com>
To: Nikolay Aleksandrov <nikolay@...hat.com>
Cc: Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...hat.com>,
Andy Gospodarek <andy@...yhouse.net>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Maciej Zenczykowski <maze@...gle.com>
Subject: Re: [PATCH net-next v3 2/2] bonding: Simplify the xmit function for
modes that use xmit_hash
Even though the patch is working correctly (me and Nikolay have
resolved the vm vs. non-vm issues), there is a race situation created
since the lock-simplification patches are applied. We would resolve
this before this patch can be applied. Probably the first patch in
this series is stuck unnecessarily, so I will disassociate that patch
and send it separately.
On Mon, Sep 15, 2014 at 11:00 AM, Mahesh Bandewar <maheshb@...gle.com> wrote:
> Even though the patch is working correctly (me and Nikolay have resolved the
> vm vs. non-vm issues), there is a race situation created since the
> lock-simplification patches are applied. We would resolve this before this
> patch can be applied. Probably the first patch in this series is stuck
> unnecessarily, so I will disassociate that patch and send it separately.
>
> On Thu, Sep 11, 2014 at 3:44 PM, Nikolay Aleksandrov <nikolay@...hat.com>
> wrote:
>>
>> On 09/12/2014 12:08 AM, Mahesh Bandewar wrote:
>> > some how my earlier mail bounced back (formatting issues, I suppose!).
>> > So it's a resend.
>> >
>> > On Thu, Sep 11, 2014 at 2:27 PM, Mahesh Bandewar <maheshb@...gle.com>
>> > wrote:
>> >>
>> >> On Thu, Sep 11, 2014 at 2:39 AM, Nikolay Aleksandrov
>> >> <nikolay@...hat.com> wrote:
>> >>> On 11/09/14 06:16, Mahesh Bandewar wrote:
>> >>>>
>> >>>> Earlier change to use usable slave array for TLB mode had an
>> >>>> additional
>> >>>> performance advantage. So extending the same logic to all other modes
>> >>>> that use xmit-hash for slave selection (viz 802.3AD, and XOR modes).
>> >>>> Also consolidating this with the earlier TLB change.
>> >>>>
>> >>>> The main idea is to build the usable slaves array in the control path
>> >>>> and use that array for slave selection during xmit operation.
>> >>>>
>> >>>> Measured performance in a setup with a bond of 4x1G NICs with 200
>> >>>> instances of netperf for the modes involved (3ad, xor, tlb)
>> >>>> cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5
>> >>>>
>> >>>> Mode TPS-Before TPS-After
>> >>>>
>> >>>> 802.3ad : 468,694 493,101
>> >>>> TLB (lb=0): 392,583 392,965
>> >>>> XOR : 475,696 484,517
>> >>>>
>> >>>> Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
>> >>>> ---
>> >>>> v1:
>> >>>> (a) If bond_update_slave_arr() fails to allocate memory, it will
>> >>>> overwrite
>> >>>> the slave that need to be removed.
>> >>>> (b) Freeing of array will assign NULL (to handle bond->down to
>> >>>> bond->up
>> >>>> transition gracefully.
>> >>>> (c) Change from pr_debug() to pr_err() if bond_update_slave_arr()
>> >>>> returns
>> >>>> failure.
>> >>>> (d) XOR: bond_update_slave_arr() will consider mii-mon, arp-mon
>> >>>> cases
>> >>>> and
>> >>>> will populate the array even if these parameters are not used.
>> >>>> (e) 3AD: Should handle the ad_agg_selection_logic correctly.
>> >>>> v2:
>> >>>> (a) Removed rcu_read_{un}lock() calls from array manipulation
>> >>>> code.
>> >>>> (b) Slave link-events now refresh array for all these modes.
>> >>>> (c) Moved free-array call from bond_close() to bond_uninit().
>> >>>> v3:
>> >>>> (a) Fixed null pointer dereference.
>> >>>> (b) Removed bond->lock lockdep dependency.
>> >>>>
>> >>> Hello Mahesh,
>> >>> You should've given me time to respond, the reason I wrote this:
>> >>> "First a question, if a bond device in XOR mode is up and we enslave a
>> >>> single
>> >>> slave how would it start transmitting ? Same question, if we are
>> >>> enslaving
>> >>> a
>> >>> second device then the array will be rebuild with only the first upon
>> >>> NETDEV_UP
>> >>> (of course all this is in the case miimon is 0).
>> >>> The NETDEV_UP upon enslave happens before the slave is linked in."
>> >>> was not because I wanted you to remove the slave rebuilding from the
>> >>> NETDEV_UP/DOWN events, but because I didn't see how would a slave
>> >>> start
>> >>> transmitting in XOR mode after enslaving, and I just tested it - it
>> >>> doesn't
>> >>> since the slave array never gets rebuilt. The NETDEV_UP event is
>> >>> carried by
>> >>> the dev_open() done in bond_enslave() earlier so the
>> >>> bond_set_carrier() in
>> >>> the end isn't of much importance in most cases, simply do the
>> >>> following and
>> >>> you'll see:
>> >>> modprobe bonding mode=2
>> >>> ip set bond0 up
>> >>> ifenslave bond0 eth0
>> >>>
>> >>> Try to transmit anything and watch on the other side, you won't be
>> >>> able to
>> >>> see anything as there's no slave array. My second question was given
>> >>> all
>> >>> this, if you enslave any subsequent slaves, will it start transmitting
>> >>> ? But
>> >>> I just tested this scenario and it still doesn't as the array doesn't
>> >>> get
>> >>> built.
>> >>
>> > OK I tried that and I don't see what you have observed -
>> >
>> > Here is the log -
>> >
>> > ~# modprobe bonding mode=2
>> > ~# ip link set bond0 up
>> > [ 133.537516] New slave count=0
>> > ~# [add ip addr]
>> > ~# [add default route]
>> > ~# ifenslave bond0 eth0
>> > ~# [ 211.044852] Adding slave=eth0
>> > [ 211.047826] New slave count=1
>> > ~# ifenslave bond0 eth1
>> > [ 723.795877] Adding slave=eth0
>> > [ 723.798853] Adding slave=eth1
>> > [ 723.801824] New slave count=2
>> >
>> > I have added some instrumentation to see what is happening and
>> > following are the couple of changes (basically printfs!) -
>> >
>> >
>> > @@ -3730,13 +3736,16 @@ int bond_update_slave_arr(struct bonding
>> > *bond, struct slave *skipslave)
>> > continue;
>> > if (skipslave == slave)
>> > continue;
>> > +pr_err("Adding slave=%s\n", slave->dev->name);
>> > new_arr->arr[new_arr->count++] = slave;
>> > }
>> >
>> > old_arr = rcu_dereference_protected(bond->slave_arr,
>> > lockdep_rtnl_is_held() ||
>> >
>> > lockdep_is_held(&bond->curr_slave_lock));
>> > rcu_assign_pointer(bond->slave_arr, new_arr);
>> > +pr_err("New slave count=%d\n", new_arr->count);
>> > if (old_arr)
>> > kfree_rcu(old_arr, rcu);
>> >
>> >
>> >
>> Something is wrong here either you have some modified version or something
>> else is going on because with your patch + the pr_err()s in
>> bond_update_slave_arr() I get the following:
>> # modprobe bonding mode=2
>> # ip l set bond0 up
>> [ 47.905891] Slave count: 0
>> # ip addr add 192.168.160.2/24 dev bond0
>> # ifenslave bond0 eth1
>> [ 78.056764] bond0: Enslaving eth1 as an active interface with an up
>> link
>> [ 78.056789] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
>> *(no slave array messages)*
>> # ping ...
>> nothing.
>>
>> Doing a down/up cycle on the bond works as it should:
>> # ip l set bond0 down
>> # ip l set bond0 up
>> [ 686.786898] Adding slave: eth1
>> [ 686.787153] Adding slave: eth2
>> [ 686.787329] Slave count: 2
>> And now it begins to work.
>>
>> Same happens if I enslave a subsequent slave, I really don't see how your
>> arrays get rebuilt, do you have a different patch which does array
>> rebuilding in bond_enslave() ? The NETDEV_UP for the slave is generated by
>> dev_open() and it's too early to be caught as a slave by the bond's
>> notifier. I'd be curious to see the stack trace on your slave rebuilding
>> after the first enslave, could you insert a WARN_ON(1) in the
>> bond_update_slave_arr() for example and see what's triggering the slave
>> rebuild ?
>>
>> In contrast the same module without your patches:
>> # modprobe bonding mode=2
>> # ip l set bond0 up
>> # ip addr add 192.168.160.2/24 dev bond0
>> # ifenslave bond0 eth1
>> # ping ...
>> works.
>>
>> I'm doing these tests in a VM with virtio_net devices.
>> Also my default miimon is 0, if that matters.
>>
>> My tree looks like:
>> 9afec9969e3b15a8d52acb07df48423c2f941e50 bonding: Simplify the xmit
>> function for modes that use xmit_hash
>> d876c8ca5d46c3b4c201289f48011350efa545ce bonding: display xmit_hash_policy
>> for non-dynamic-tlb mode
>> 9b5a8a12737ee9ff100e87ab7fdfdec4d0999f4e net: bpf: only build
>> bpf_jit_binary_{alloc, free}() when jit selected
>>
>> >>> A few suggestions below, nothing serious though.
>> >>>
>> >>>
>> >>>> drivers/net/bonding/bond_3ad.c | 76 +++-----------------
>> >>>> drivers/net/bonding/bond_alb.c | 51 ++------------
>> >>>> drivers/net/bonding/bond_alb.h | 8 ---
>> >>>> drivers/net/bonding/bond_main.c | 152
>> >>>> ++++++++++++++++++++++++++++++++++++----
>> >>>> drivers/net/bonding/bonding.h | 8 +++
>> >>>> 5 files changed, 164 insertions(+), 131 deletions(-)
>> >>>>
>> >>>> diff --git a/drivers/net/bonding/bond_3ad.c
>> >>>> b/drivers/net/bonding/bond_3ad.c
>> >>>> index 5d27a6207384..516075f0a740 100644
>> >>>> --- a/drivers/net/bonding/bond_3ad.c
>> >>>> +++ b/drivers/net/bonding/bond_3ad.c
>> >>>> @@ -1579,6 +1579,8 @@ static void ad_agg_selection_logic(struct
>> >>>> aggregator
>> >>>> *agg)
>> >>>> __disable_port(port);
>> >>>> }
>> >>>> }
>> >>>> + if (bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for 3ad
>> >>>> mode.\n");
>> >>>> }
>> >>>>
>> >>>> /* if the selected aggregator is of join individuals
>> >>>> @@ -1717,6 +1719,8 @@ static void
>> >>>> ad_enable_collecting_distributing(struct
>> >>>> port *port)
>> >>>> port->actor_port_number,
>> >>>> port->aggregator->aggregator_identifier);
>> >>>> __enable_port(port);
>> >>>> + if (bond_update_slave_arr(port->slave->bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for 3ad
>> >>>> mode.\n");
>> >>>> }
>> >>>> }
>> >>>>
>> >>>> @@ -1733,6 +1737,8 @@ static void
>> >>>> ad_disable_collecting_distributing(struct port *port)
>> >>>> port->actor_port_number,
>> >>>> port->aggregator->aggregator_identifier);
>> >>>> __disable_port(port);
>> >>>> + if (bond_update_slave_arr(port->slave->bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for 3ad
>> >>>> mode.\n");
>> >>>> }
>> >>>> }
>> >>>>
>> >>>> @@ -2311,6 +2317,9 @@ void bond_3ad_handle_link_change(struct slave
>> >>>> *slave, char link)
>> >>>> */
>> >>>> port->sm_vars |= AD_PORT_BEGIN;
>> >>>>
>> >>>> + if (bond_update_slave_arr(slave->bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for 3ad
>> >>>> mode.\n");
>> >>>> +
>> >>>> __release_state_machine_lock(port);
>> >>>> }
>> >>>>
>> >>>> @@ -2406,73 +2415,6 @@ int bond_3ad_get_active_agg_info(struct
>> >>>> bonding
>> >>>> *bond, struct ad_info *ad_info)
>> >>>> return ret;
>> >>>> }
>> >>>>
>> >>>> -int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>> >>>> -{
>> >>>> - struct bonding *bond = netdev_priv(dev);
>> >>>> - struct slave *slave, *first_ok_slave;
>> >>>> - struct aggregator *agg;
>> >>>> - struct ad_info ad_info;
>> >>>> - struct list_head *iter;
>> >>>> - int slaves_in_agg;
>> >>>> - int slave_agg_no;
>> >>>> - int agg_id;
>> >>>> -
>> >>>> - if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
>> >>>> - netdev_dbg(dev, "__bond_3ad_get_active_agg_info
>> >>>> failed\n");
>> >>>> - goto err_free;
>> >>>> - }
>> >>>> -
>> >>>> - slaves_in_agg = ad_info.ports;
>> >>>> - agg_id = ad_info.aggregator_id;
>> >>>> -
>> >>>> - if (slaves_in_agg == 0) {
>> >>>> - netdev_dbg(dev, "active aggregator is empty\n");
>> >>>> - goto err_free;
>> >>>> - }
>> >>>> -
>> >>>> - slave_agg_no = bond_xmit_hash(bond, skb) % slaves_in_agg;
>> >>>> - first_ok_slave = NULL;
>> >>>> -
>> >>>> - bond_for_each_slave_rcu(bond, slave, iter) {
>> >>>> - agg = SLAVE_AD_INFO(slave)->port.aggregator;
>> >>>> - if (!agg || agg->aggregator_identifier != agg_id)
>> >>>> - continue;
>> >>>> -
>> >>>> - if (slave_agg_no >= 0) {
>> >>>> - if (!first_ok_slave &&
>> >>>> bond_slave_can_tx(slave))
>> >>>> - first_ok_slave = slave;
>> >>>> - slave_agg_no--;
>> >>>> - continue;
>> >>>> - }
>> >>>> -
>> >>>> - if (bond_slave_can_tx(slave)) {
>> >>>> - bond_dev_queue_xmit(bond, skb, slave->dev);
>> >>>> - goto out;
>> >>>> - }
>> >>>> - }
>> >>>> -
>> >>>> - if (slave_agg_no >= 0) {
>> >>>> - netdev_err(dev, "Couldn't find a slave to tx on for
>> >>>> aggregator ID %d\n",
>> >>>> - agg_id);
>> >>>> - goto err_free;
>> >>>> - }
>> >>>> -
>> >>>> - /* we couldn't find any suitable slave after the agg_no, so
>> >>>> use
>> >>>> the
>> >>>> - * first suitable found, if found.
>> >>>> - */
>> >>>> - if (first_ok_slave)
>> >>>> - bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);
>> >>>> - else
>> >>>> - goto err_free;
>> >>>> -
>> >>>> -out:
>> >>>> - return NETDEV_TX_OK;
>> >>>> -err_free:
>> >>>> - /* no suitable interface, frame not sent */
>> >>>> - dev_kfree_skb_any(skb);
>> >>>> - goto out;
>> >>>> -}
>> >>>> -
>> >>>> int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding
>> >>>> *bond,
>> >>>> struct slave *slave)
>> >>>> {
>> >>>> diff --git a/drivers/net/bonding/bond_alb.c
>> >>>> b/drivers/net/bonding/bond_alb.c
>> >>>> index 028496205f39..dbac0ceb17f6 100644
>> >>>> --- a/drivers/net/bonding/bond_alb.c
>> >>>> +++ b/drivers/net/bonding/bond_alb.c
>> >>>> @@ -200,7 +200,6 @@ static int tlb_initialize(struct bonding *bond)
>> >>>> static void tlb_deinitialize(struct bonding *bond)
>> >>>> {
>> >>>> struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>> >>>> - struct tlb_up_slave *arr;
>> >>>>
>> >>>> _lock_tx_hashtbl_bh(bond);
>> >>>>
>> >>>> @@ -208,10 +207,6 @@ static void tlb_deinitialize(struct bonding
>> >>>> *bond)
>> >>>> bond_info->tx_hashtbl = NULL;
>> >>>>
>> >>>> _unlock_tx_hashtbl_bh(bond);
>> >>>> -
>> >>>> - arr = rtnl_dereference(bond_info->slave_arr);
>> >>>> - if (arr)
>> >>>> - kfree_rcu(arr, rcu);
>> >>>> }
>> >>>>
>> >>>> static long long compute_gap(struct slave *slave)
>> >>>> @@ -1409,39 +1404,9 @@ out:
>> >>>> return NETDEV_TX_OK;
>> >>>> }
>> >>>>
>> >>>> -static int bond_tlb_update_slave_arr(struct bonding *bond,
>> >>>> - struct slave *skipslave)
>> >>>> -{
>> >>>> - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>> >>>> - struct slave *tx_slave;
>> >>>> - struct list_head *iter;
>> >>>> - struct tlb_up_slave *new_arr, *old_arr;
>> >>>> -
>> >>>> - new_arr = kzalloc(offsetof(struct tlb_up_slave,
>> >>>> arr[bond->slave_cnt]),
>> >>>> - GFP_ATOMIC);
>> >>>> - if (!new_arr)
>> >>>> - return -ENOMEM;
>> >>>> -
>> >>>> - bond_for_each_slave(bond, tx_slave, iter) {
>> >>>> - if (!bond_slave_can_tx(tx_slave))
>> >>>> - continue;
>> >>>> - if (skipslave == tx_slave)
>> >>>> - continue;
>> >>>> - new_arr->arr[new_arr->count++] = tx_slave;
>> >>>> - }
>> >>>> -
>> >>>> - old_arr = rtnl_dereference(bond_info->slave_arr);
>> >>>> - rcu_assign_pointer(bond_info->slave_arr, new_arr);
>> >>>> - if (old_arr)
>> >>>> - kfree_rcu(old_arr, rcu);
>> >>>> -
>> >>>> - return 0;
>> >>>> -}
>> >>>> -
>> >>>> int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>> >>>> {
>> >>>> struct bonding *bond = netdev_priv(bond_dev);
>> >>>> - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>> >>>> struct ethhdr *eth_data;
>> >>>> struct slave *tx_slave = NULL;
>> >>>> u32 hash_index;
>> >>>> @@ -1462,12 +1427,14 @@ int bond_tlb_xmit(struct sk_buff *skb, struct
>> >>>> net_device *bond_dev)
>> >>>>
>> >>>> hash_index &
>> >>>> 0xFF,
>> >>>>
>> >>>> skb->len);
>> >>>> } else {
>> >>>> - struct tlb_up_slave *slaves;
>> >>>> + struct bond_up_slave *slaves;
>> >>>> + unsigned int count;
>> >>>>
>> >>>> - slaves =
>> >>>> rcu_dereference(bond_info->slave_arr);
>> >>>> - if (slaves && slaves->count)
>> >>>> + slaves =
>> >>>> rcu_dereference(bond->slave_arr);
>> >>>> + count = slaves ? slaves->count : 0;
>> >>>> + if (count)
>> >>>
>> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >>> In both of these cases (slaves & slaves->count) you could use likely()
>> >>> as in
>> >>> the case we have slaves to transmit, it'd be advantageous and will be
>> >>> hit
>> >>> every time. The cases that we get here and don't have a
>> >>> slave_arr/count are
>> >>> mostly 2:
>> >>> 1. The slaves got released between the start of xmit and this part
>> >>> 2. They were not eligible so the array got emptied
>> >>> In both of these cases we don't care about the fallout path since
>> >>> transmission isn't possible anyway.
>> >>> Also another minor thing is that usually in the cases where we want to
>> >>> fetch
>> >>> a variable only once ACCESS_ONCE() is used as a weak compiler barrier
>> >>> to
>> >>> make sure the compiler doesn't optimize out something. The others can
>> >>> correct me if I'm wrong, but I think in this case it's a good
>> >>> precaution for
>> >>> slaves->count.
>> >>> These are merely suggestions, I might be wrong.
>> >>>
>> > Will do.
>> >>>
>> >>>> tx_slave =
>> >>>> slaves->arr[hash_index
>> >>>> %
>> >>>> -
>> >>>> slaves->count];
>> >>>> +
>> >>>> count];
>> >>>> }
>> >>>> break;
>> >>>> }
>> >>>> @@ -1733,10 +1700,6 @@ void bond_alb_deinit_slave(struct bonding
>> >>>> *bond,
>> >>>> struct slave *slave)
>> >>>> rlb_clear_slave(bond, slave);
>> >>>> }
>> >>>>
>> >>>> - if (bond_is_nondyn_tlb(bond))
>> >>>> - if (bond_tlb_update_slave_arr(bond, slave))
>> >>>> - pr_err("Failed to build slave-array for TLB
>> >>>> mode.\n");
>> >>>> -
>> >>>> }
>> >>>>
>> >>>> /* Caller must hold bond lock for read */
>> >>>> @@ -1762,7 +1725,7 @@ void bond_alb_handle_link_change(struct bonding
>> >>>> *bond, struct slave *slave, char
>> >>>> }
>> >>>>
>> >>>> if (bond_is_nondyn_tlb(bond)) {
>> >>>> - if (bond_tlb_update_slave_arr(bond, NULL))
>> >>>> + if (bond_update_slave_arr(bond, NULL))
>> >>>> pr_err("Failed to build slave-array for TLB
>> >>>> mode.\n");
>> >>>> }
>> >>>> }
>> >>>> diff --git a/drivers/net/bonding/bond_alb.h
>> >>>> b/drivers/net/bonding/bond_alb.h
>> >>>> index aaeac61d03cf..5fc76c01636c 100644
>> >>>> --- a/drivers/net/bonding/bond_alb.h
>> >>>> +++ b/drivers/net/bonding/bond_alb.h
>> >>>> @@ -139,20 +139,12 @@ struct tlb_slave_info {
>> >>>> */
>> >>>> };
>> >>>>
>> >>>> -struct tlb_up_slave {
>> >>>> - unsigned int count;
>> >>>> - struct rcu_head rcu;
>> >>>> - struct slave *arr[0];
>> >>>> -};
>> >>>> -
>> >>>> struct alb_bond_info {
>> >>>> struct tlb_client_info *tx_hashtbl; /* Dynamically allocated
>> >>>> */
>> >>>> spinlock_t tx_hashtbl_lock;
>> >>>> u32 unbalanced_load;
>> >>>> int tx_rebalance_counter;
>> >>>> int lp_counter;
>> >>>> - /* -------- non-dynamic tlb mode only ---------*/
>> >>>> - struct tlb_up_slave __rcu *slave_arr; /* Up slaves */
>> >>>> /* -------- rlb parameters -------- */
>> >>>> int rlb_enabled;
>> >>>> struct rlb_client_info *rx_hashtbl; /* Receive hash table
>> >>>> */
>> >>>> diff --git a/drivers/net/bonding/bond_main.c
>> >>>> b/drivers/net/bonding/bond_main.c
>> >>>> index b43b2df9e5d1..4412c458939d 100644
>> >>>> --- a/drivers/net/bonding/bond_main.c
>> >>>> +++ b/drivers/net/bonding/bond_main.c
>> >>>> @@ -1700,6 +1700,10 @@ static int __bond_release_one(struct
>> >>>> net_device
>> >>>> *bond_dev,
>> >>>> write_unlock_bh(&bond->curr_slave_lock);
>> >>>> }
>> >>>>
>> >>>> + if (bond_mode_uses_xmit_hash(bond) &&
>> >>>> + bond_update_slave_arr(bond, slave))
>> >>>> + pr_err("Failed to build slave-array.\n");
>> >>>> +
>> >>>> netdev_info(bond_dev, "Releasing %s interface %s\n",
>> >>>> bond_is_active_slave(slave) ? "active" :
>> >>>> "backup",
>> >>>> slave_dev->name);
>> >>>> @@ -2015,6 +2019,10 @@ static void bond_miimon_commit(struct bonding
>> >>>> *bond)
>> >>>> bond_alb_handle_link_change(bond,
>> >>>> slave,
>> >>>>
>> >>>> BOND_LINK_UP);
>> >>>>
>> >>>> + if (BOND_MODE(bond) == BOND_MODE_XOR &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array
>> >>>> for
>> >>>> XOR mode.\n");
>> >>>> +
>> >>>> if (!bond->curr_active_slave || slave ==
>> >>>> primary)
>> >>>> goto do_failover;
>> >>>>
>> >>>> @@ -2042,6 +2050,10 @@ static void bond_miimon_commit(struct bonding
>> >>>> *bond)
>> >>>> bond_alb_handle_link_change(bond,
>> >>>> slave,
>> >>>>
>> >>>> BOND_LINK_DOWN);
>> >>>>
>> >>>> + if (BOND_MODE(bond) == BOND_MODE_XOR &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array
>> >>>> for
>> >>>> XOR mode.\n");
>> >>>> +
>> >>>> if (slave ==
>> >>>> rcu_access_pointer(bond->curr_active_slave))
>> >>>> goto do_failover;
>> >>>>
>> >>>> @@ -2505,6 +2517,9 @@ static void bond_loadbalance_arp_mon(struct
>> >>>> work_struct *work)
>> >>>>
>> >>>> if (slave_state_changed) {
>> >>>> bond_slave_state_change(bond);
>> >>>> + if (BOND_MODE(bond) == BOND_MODE_XOR &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array
>> >>>> for
>> >>>> XOR mode.\n");
>> >>>> } else if (do_failover) {
>> >>>> /* the bond_select_active_slave must hold
>> >>>> RTNL
>> >>>> * and curr_slave_lock for write.
>> >>>> @@ -2899,11 +2914,23 @@ static int bond_slave_netdev_event(unsigned
>> >>>> long
>> >>>> event,
>> >>>> if (old_duplex != slave->duplex)
>> >>>>
>> >>>> bond_3ad_adapter_duplex_changed(slave);
>> >>>> }
>> >>>> + /* Refresh slave-array if applicable!
>> >>>> + * If the setuo does not use miimon or arpmon
>> >>>> (mode-specific!),
>> >>>> + * then these event will not cause the slave-array to
>> >>>> be
>> >>>> + * refreshed. This will cause xmit to use a slave
>> >>>> that is
>> >>>> not
>> >>>> + * usable. Avoid such situation by refeshing the
>> >>>> array at
>> >>>> these
>> >>>> + * events. If these (miimon/arpmon) parameters are
>> >>>> configured
>> >>>> + * then array gets refreshed twice and that should be
>> >>>> fine!
>> >>>> + */
>> >>>> + if (bond_mode_uses_xmit_hash(bond) &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for XOR
>> >>>> mode.\n");
>> >>>> break;
>> >>>> case NETDEV_DOWN:
>> >>>> - /*
>> >>>> - * ... Or is it this?
>> >>>> - */
>> >>>> + /* Refresh slave-array if applicable! */
>> >>>> + if (bond_mode_uses_xmit_hash(bond) &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for XOR
>> >>>> mode.\n");
>> >>>> break;
>> >>>> case NETDEV_CHANGEMTU:
>> >>>> /*
>> >>>> @@ -3147,6 +3174,10 @@ static int bond_open(struct net_device
>> >>>> *bond_dev)
>> >>>> bond_3ad_initiate_agg_selection(bond, 1);
>> >>>> }
>> >>>>
>> >>>> + if (bond_mode_uses_xmit_hash(bond) &&
>> >>>> + bond_update_slave_arr(bond, NULL))
>> >>>> + pr_err("Failed to build slave-array for XOR
>> >>>> mode.\n");
>> >>>> +
>> >>>> return 0;
>> >>>> }
>> >>>>
>> >>>> @@ -3654,15 +3685,106 @@ static int bond_xmit_activebackup(struct
>> >>>> sk_buff
>> >>>> *skb, struct net_device *bond_d
>> >>>> return NETDEV_TX_OK;
>> >>>> }
>> >>>>
>> >>>> -/* In bond_xmit_xor() , we determine the output device by using a
>> >>>> pre-
>> >>>> - * determined xmit_hash_policy(), If the selected device is not
>> >>>> enabled,
>> >>>> - * find the next active slave.
>> >>>> +/* Build the usable slaves array in control path for modes that use
>> >>>> xmit-hash
>> >>>> + * to determine the slave interface -
>> >>>> + * (a) BOND_MODE_8023AD
>> >>>> + * (b) BOND_MODE_XOR
>> >>>> + * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
>> >>>> */
>> >>>> -static int bond_xmit_xor(struct sk_buff *skb, struct net_device
>> >>>> *bond_dev)
>> >>>> +int bond_update_slave_arr(struct bonding *bond, struct slave
>> >>>> *skipslave)
>> >>>> {
>> >>>> - struct bonding *bond = netdev_priv(bond_dev);
>> >>>> + struct slave *slave;
>> >>>> + struct list_head *iter;
>> >>>> + struct bond_up_slave *new_arr, *old_arr;
>> >>>> + int slaves_in_agg;
>> >>>> + int agg_id = 0;
>> >>>> + int ret = 0;
>> >>>> +
>> >>>> + new_arr = kzalloc(offsetof(struct bond_up_slave,
>> >>>> arr[bond->slave_cnt]),
>> >>>> + GFP_ATOMIC);
>> >>>> + if (!new_arr) {
>> >>>> + ret = -ENOMEM;
>> >>>> + goto out;
>> >>>> + }
>> >>>> + if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>> >>>> + struct ad_info ad_info;
>> >>>> +
>> >>>> + if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
>> >>>> + pr_debug("bond_3ad_get_active_agg_info
>> >>>> failed\n");
>> >>>> + kfree_rcu(new_arr, rcu);
>> >>>> + ret = -EINVAL;
>> >>>> + goto out;
>> >>>> + }
>> >>>> + slaves_in_agg = ad_info.ports;
>> >>>> + agg_id = ad_info.aggregator_id;
>> >>>> + }
>> >>>> + bond_for_each_slave(bond, slave, iter) {
>> >>>> + if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>> >>>> + struct aggregator *agg;
>> >>>>
>> >>>> - bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb) %
>> >>>> bond->slave_cnt);
>> >>>> + agg = SLAVE_AD_INFO(slave)->port.aggregator;
>> >>>> + if (!agg || agg->aggregator_identifier !=
>> >>>> agg_id)
>> >>>> + continue;
>> >>>> + }
>> >>>> + if (!bond_slave_can_tx(slave))
>> >>>> + continue;
>> >>>> + if (skipslave == slave)
>> >>>> + continue;
>> >>>> + new_arr->arr[new_arr->count++] = slave;
>> >>>> + }
>> >>>> +
>> >>>> + old_arr = rcu_dereference_protected(bond->slave_arr,
>> >>>> + lockdep_rtnl_is_held() ||
>> >>>> +
>> >>>> lockdep_is_held(&bond->curr_slave_lock));
>> >>>> + rcu_assign_pointer(bond->slave_arr, new_arr);
>> >>>> + if (old_arr)
>> >>>> + kfree_rcu(old_arr, rcu);
>> >>>> +
>> >>>> +out:
>> >>>> + if (ret != 0 && skipslave) {
>> >>>> + int idx;
>> >>>> +
>> >>>> + /* Rare situation where caller has asked to skip a
>> >>>> specific
>> >>>> + * slave but allocation failed (most likely!). BTW
>> >>>> this is
>> >>>> + * only possible when the call is initiated from
>> >>>> + * __bond_release_one(). In this sitation; overwrite
>> >>>> the
>> >>>> + * skipslave entry in the array with the last entry
>> >>>> from
>> >>>> the
>> >>>> + * array to avoid a situation where the xmit path may
>> >>>> choose
>> >>>> + * this to-be-skipped slave to send a packet out.
>> >>>> + */
>> >>>> + old_arr = rtnl_dereference(bond->slave_arr);
>> >>>> + for (idx = 0; idx < old_arr->count; idx++) {
>> >>>> + if (skipslave == old_arr->arr[idx]) {
>> >>>> + old_arr->arr[idx] =
>> >>>> + old_arr->arr[old_arr->count-1];
>> >>>> + old_arr->count--;
>> >>>> + break;
>> >>>> + }
>> >>>> + }
>> >>>> + }
>> >>>> + return ret;
>> >>>> +}
>> >>>> +
>> >>>> +/* Use this Xmit function for 3AD as well as XOR modes. The current
>> >>>> + * usable slave array is formed in the control path. The xmit
>> >>>> function
>> >>>> + * just calculates hash and sends the packet out.
>> >>>> + */
>> >>>> +int bond_3ad_xor_xmit(struct sk_buff *skb, struct net_device *dev)
>> >>>> +{
>> >>>> + struct bonding *bond = netdev_priv(dev);
>> >>>> + struct slave *slave;
>> >>>> + struct bond_up_slave *slaves;
>> >>>> + unsigned int count;
>> >>>> +
>> >>>> + slaves = rcu_dereference(bond->slave_arr);
>> >>>> + count = slaves ? slaves->count : 0;
>> >>>> + if (count) {
>> >>>
>> >>> ^^^^^^^^^^^^^^
>> >>> The same comment as above applies here, too.
>> >>>
>> >>>
>> >>>> + slave = slaves->arr[bond_xmit_hash(bond, skb) %
>> >>>> count];
>> >>>> + bond_dev_queue_xmit(bond, skb, slave->dev);
>> >>>> + } else {
>> >>>> + dev_kfree_skb_any(skb);
>> >>>> + atomic_long_inc(&dev->tx_dropped);
>> >>>> + }
>> >>>>
>> >>>> return NETDEV_TX_OK;
>> >>>> }
>> >>>> @@ -3764,12 +3886,11 @@ static netdev_tx_t __bond_start_xmit(struct
>> >>>> sk_buff *skb, struct net_device *dev
>> >>>> return bond_xmit_roundrobin(skb, dev);
>> >>>> case BOND_MODE_ACTIVEBACKUP:
>> >>>> return bond_xmit_activebackup(skb, dev);
>> >>>> + case BOND_MODE_8023AD:
>> >>>> case BOND_MODE_XOR:
>> >>>> - return bond_xmit_xor(skb, dev);
>> >>>> + return bond_3ad_xor_xmit(skb, dev);
>> >>>> case BOND_MODE_BROADCAST:
>> >>>> return bond_xmit_broadcast(skb, dev);
>> >>>> - case BOND_MODE_8023AD:
>> >>>> - return bond_3ad_xmit_xor(skb, dev);
>> >>>> case BOND_MODE_ALB:
>> >>>> return bond_alb_xmit(skb, dev);
>> >>>> case BOND_MODE_TLB:
>> >>>> @@ -3947,6 +4068,7 @@ static void bond_uninit(struct net_device
>> >>>> *bond_dev)
>> >>>> struct bonding *bond = netdev_priv(bond_dev);
>> >>>> struct list_head *iter;
>> >>>> struct slave *slave;
>> >>>> + struct bond_up_slave *arr;
>> >>>>
>> >>>> bond_netpoll_cleanup(bond_dev);
>> >>>>
>> >>>> @@ -3955,6 +4077,12 @@ static void bond_uninit(struct net_device
>> >>>> *bond_dev)
>> >>>> __bond_release_one(bond_dev, slave->dev, true);
>> >>>> netdev_info(bond_dev, "Released all slaves\n");
>> >>>>
>> >>>> + arr = rtnl_dereference(bond->slave_arr);
>> >>>> + if (arr) {
>> >>>> + kfree_rcu(arr, rcu);
>> >>>> + RCU_INIT_POINTER(bond->slave_arr, NULL);
>> >>>> + }
>> >>>> +
>> >>>> list_del(&bond->bond_list);
>> >>>>
>> >>>> bond_debug_unregister(bond);
>> >>>> diff --git a/drivers/net/bonding/bonding.h
>> >>>> b/drivers/net/bonding/bonding.h
>> >>>> index 8375133dd347..78d6d3b7a780 100644
>> >>>> --- a/drivers/net/bonding/bonding.h
>> >>>> +++ b/drivers/net/bonding/bonding.h
>> >>>> @@ -177,6 +177,12 @@ struct slave {
>> >>>> struct kobject kobj;
>> >>>> };
>> >>>>
>> >>>> +struct bond_up_slave {
>> >>>> + unsigned int count;
>> >>>> + struct rcu_head rcu;
>> >>>> + struct slave *arr[0];
>> >>>> +};
>> >>>> +
>> >>>> /*
>> >>>> * Link pseudo-state only used internally by monitors
>> >>>> */
>> >>>> @@ -193,6 +199,7 @@ struct bonding {
>> >>>> struct slave __rcu *curr_active_slave;
>> >>>> struct slave __rcu *current_arp_slave;
>> >>>> struct slave __rcu *primary_slave;
>> >>>> + struct bond_up_slave __rcu *slave_arr; /* Array of usable
>> >>>> slaves
>> >>>> */
>> >>>> bool force_primary;
>> >>>> s32 slave_cnt; /* never change this value outside the
>> >>>> attach/detach wrappers */
>> >>>> int (*recv_probe)(const struct sk_buff *, struct bonding
>> >>>> *,
>> >>>> @@ -530,6 +537,7 @@ const char *bond_slave_link_status(s8 link);
>> >>>> struct bond_vlan_tag *bond_verify_device_path(struct net_device
>> >>>> *start_dev,
>> >>>> struct net_device
>> >>>> *end_dev,
>> >>>> int level);
>> >>>> +int bond_update_slave_arr(struct bonding *bond, struct slave
>> >>>> *skipslave);
>> >>>>
>> >>>> #ifdef CONFIG_PROC_FS
>> >>>> void bond_create_proc_entry(struct bonding *bond);
>> >>>>
>> >>>
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@...r.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists