[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <540ADF95.3070606@redhat.com>
Date: Sat, 06 Sep 2014 12:19:01 +0200
From: Nikolay Aleksandrov <nikolay@...hat.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: netdev@...r.kernel.org, vfalico@...il.com, andy@...yhouse.net,
davem@...emloft.net
Subject: Re: [RFC net-next 1/5] bonding: 3ad: use curr_slave_lock instead
of bond->lock
On 09/06/2014 03:08 AM, Jay Vosburgh wrote:
> Nikolay Aleksandrov <nikolay@...hat.com> wrote:
>
>> On 09/05/2014 10:37 PM, Jay Vosburgh wrote:
>>> Nikolay Aleksandrov <nikolay@...hat.com> wrote:
>>>
>>>> In 3ad mode the only syncing needed by bond->lock is for the wq
>>>> and the recv handler, so change them to use curr_slave_lock.
>>>> There're no locking dependencies here as 3ad doesn't use
>>>> curr_slave_lock at all.
>>>
>>> One subtle aspect of the 3ad locking is that it's not really
>>> using the "read" property of the read lock with regard to the state
>>> machine; it's largely using it as a spin lock, because there is at most
>>> one reader and at most one writer in the full state machine code
>>> (although there are multiple reader possibilities elsewhere). The code
>>> would break if there actually were multiple read-lock holders in the
>>> full state machine or aggregator selection logic simultaneously.
>>>
>>> Because the state machine and incoming LACPDU cases both acquire
>>> the read lock for read, there is a separate per-port "state machine"
>>> spin lock to protect only the per-port fields that LACPDU and periodic
>>> state machine both touch. The incoming LACPDU case doesn't call into
>>> the full state machine, only the RX processing which can't go into agg
>>> selection, so this works.
>>>
>>> The agg selection can be entered via the unbind path or the
>>> periodic state machine (and only these two paths), and relies on the
>>> "one reader max" usage of the read lock to mutex the code paths that may
>>> enter agg selection.
>>>
>>> I suspect that what 3ad may need is a spin lock, not a read
>>> lock, because the multiple reader property isn't really being utilized;
>>> the incoming LACPDU and periodic state machine both acquire the read
>>> lock for read, but then acquire a second per-port spin lock. If the
>>> "big" (bond->lock currently) lock is a spin lock, then the per-port
>>> state machine lock is unnecessary, as the only purpose of the per-port
>>> lock is to mutex the one case that does have multiple readers.
>>>
>>> In actual practice I doubt there are multiple simultaneous
>>> readers very often; the periodic machine runs every 100 ms, but LACPDUs
>>> arrive for each port either every second or every 30 seconds (depending
>>> on admin configuration).
>>>
>>> Since contention on these locks is generally low, we're probably
>>> better off in the long run with something simpler to understand.
>>>
>>> So, what I'm kind of saying here is that this patch isn't a bad
>>> first step, but at least for the 3ad case, removal of the bond->lock
>>> itself doesn't really simplify the locking as much as could be done.
>>>
>>> Thoughts?
>>>
>>> -J
>>>
>>>
>>
>> Hi Jay,
>> That is a very good point, my main idea was to protect __bond_release_one
>> and the machine handler otherwise I'd have removed it altogether. I know
>> that this doesn't improve on the 3ad situation, I did it mostly to get rid
>> of bond->lock first. Going with a spinlock certainly makes sense there as
>> we don't spend much time inside and the contention is not high as you said
>> and would simplify the 3ad code so I like it :-)
>> I will include it in my bond-locking todo list and will post a follow-up
>> once I've cleared the details up as I'm speaking from the top of my head
>> right now, but first I'd like to clean the current lock use especially with
>> regard to curr_slave_lock and bring it to the necessary minimum. In the
>> long run I think that we either might be able to remove curr_slave_lock
>> completely or at least reduce it to ~ 3 places and with the spinlock that
>> you suggested here, we'll be definitely able to remove it from the 3ad code.
>
> I looked into curr_slave_lock some time ago, and as I recall
> there's not much it really protects that's not also covered by RTNL,
> since changes to the active slave are all happening under RTNL these
> days. And that was before the RCU conversion; with the current code,
> I'm not sure the curr_slave_lock is providing much mutexing beyond what
> we get from RTNL.
>
Right, that was one of the main things that drove me to start this, after
reviewing Mahesh's patches I realized most of the callers already had RTNL
and curr_slave_lock looked mostly redundant, and bond->lock completely
redundant.
> There are a couple of special cases, like the TLB rebalance in
> bond_alb_monitor, but that happens once every 10 seconds, and could just
> grab RTNL for this bit:
>
> if (slave == rcu_access_pointer(bond->curr_active_slave)) {
> SLAVE_TLB_INFO(slave).load =
> bond_info->unbalanced_load /
> BOND_TLB_REBALANCE_INTERVAL;
> bond_info->unbalanced_load = 0;
>
> the "unbalanced_load," now that I'm looking at it, might already
> have some race problems since it's now updated outside of bonding locks
> in bond_do_alb_xmit. It'll probably race with multiple bond_do_alb_xmit
> functions running simultaneously as well as the tx rebalance in
> bond_alb_monitor. I think the worst that will happen is that the tx
> traffic load is distributed suboptimally for 10 seconds.
>
Yes, I don't think this needs fixing, but we might revisit it once the rest
of the pieces are in place.
> The rlb_clear_slave case that acquires curr_slave_lock already
> also has RTNL, so I'm not sure that removing the curr_slave_lock will
> have any impact there, either. Many of the other curr_slave_lock
> holders bounce the curr_slave_lock to call into net/core functions (set
> promisc, change MAC, etc), so there's already reliance on RTNL.
>
Yes, rlb_clear_slave is not the problem, the problem if we remove
curr_slave_lock was that both rlb_clear_slave and bond_alb_monitor can be
transmitting packets at the same time or a mac swap might happen triggering
packet transmission while bond_alb_monitor() is also transmitting i.e.
pretty much everything that now gets curr_slave_lock for writing on the ALB
side and transmits could transmit with bond_alb_monitor(), that's what I
was trying to avoid.
> Separately, it also might be possible to combine the various
> per-mode special locks internally into a generic "mode lock" so the alb
> and rlb hashtbl lock and the 802.3ad state machine lock could be a
> single "bond->mode_lock" that mutexes whatever special sauce the active
> mode needs protected. Not sure if that's worth the trouble or not, but
> it seems plausible at first glance.
>
Yes, I had something similar in mind /actually called it sync_lock :)/, but
I only went as far as to use it for a few places that needed syncing beyond
RTNL after curr_slave_lock was removed, you've went one step further and
this certainly looks doable. I'll keep it in mind.
> -J
>
> ---
> -Jay Vosburgh, jay.vosburgh@...onical.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists