[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <836819fc-1a70-1b30-d90c-8c00953b8b6c@ziu.info>
Date: Fri, 13 Jul 2018 01:14:50 +0200
From: Michal Soltys <soltys@....info>
To: Jay Vosburgh <jay.vosburgh@...onical.com>,
Mahesh Bandewar (महेश बंडेवार) <maheshb@...gle.com>
Cc: Chonggang Li <chonggangli@...gle.com>,
linux-netdev <netdev@...r.kernel.org>
Subject: Re: [BUG] bonded interfaces drop bpdu (stp) frames
On 2018-07-13 00:03, Jay Vosburgh wrote:
> Mahesh Bandewar (महेश बंडेवार) wrote:
>
>>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>><jay.vosburgh@...onical.com> wrote:
>>> Michal Soltys <soltys@....info> wrote:
>>>
>>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>>>>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>>>>
>>>>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys <soltys@....info> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> As weird as that sounds, this is what I observed today after bumping
>>>>>>> kernel version. I have a setup where 2 bonds are attached to linux
>>>>>>> bridge and physically are connected to two switches doing MSTP (and
>>>>>>> linux bridge is just passing them).
>>>>>>>
>>>>>>> Initially I suspected some changes related to bridge code - but quick
>>>>>>> peek at the code showed nothing suspicious - and the part of it that
>>>>>>> explicitly passes stp frames if stp is not enabled has seen little
>>>>>>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>>>>>>> regular non-bonded interfaces are attached everything works fine.
>>>>>>>
>>>>>>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>>>>>>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>>>>>>> there (with them being present just fine on active enslaved interface,
>>>>>>> or on the bond device in earlier kernels).
>>>>>>>
>>>>>>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>>>>>>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>>>>>>> debian) and 4.17.3 (tested on archlinux) are failing.
>>>>>>>
>>>>>>> Unless this is already a known issue (or you have any suggestions what
>>>>>>> could be responsible).
>>>>>>>
>>>>>> I believe these are link-local-multicast messages and sometime back a
>>>>>> change went into to not pass those frames to the bonding master. This
>>>>>> could be the side effect of that.
>>>>>
>>>>> Mahesh, I suspect you're thinking of:
>>>>>
>>>>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>>>>> Author: Chonggang Li <chonggangli@...gle.com>
>>>>> Date: Sun Apr 16 12:02:18 2017 -0700
>>>>>
>>>>> bonding: deliver link-local packets with skb->dev set to link that packets arrived on
>>>>>
>>>>> Michal, are you able to revert this patch and test?
>>>>>
>>>>> -J
>>>>>
>>>>> ---
>>>>> -Jay Vosburgh, jay.vosburgh@...onical.com
>>>>>
>>>>
>>>>
>>>>Just tested - yes, reverting that patch solves the issues.
>>>
>>> Chonggang,
>>>
>>> Reading the changelog in your commit referenced above, I'm not
>>> entirely sure what actual problem it is fixing. Could you elaborate?
>>>
>>> As the patch appears to cause a regression, it needs to be
>>> either fixed or reverted.
>>>
>>> Mahesh, you signed-off on it as well, perhaps you also have some
>>> context?
>>>
>>
>>I think the original idea behind it was to pass the LLDPDUs to the
>>stack on the interface that they came on since this is considered to
>>be link-local traffic and passing to bond-master would loose it's
>>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>>of the packet, then you don't know which slave link it came on in
>>(from LLDP consumer's perspective).
>>
>>I don't know much about STP but trunking two links and aggregating
>>this link info through bond-master seems wrong. Just like LLDP, you
>>are losing info specific to a link and the decision derived from that
>>info could be wrong.
>>
>>Having said that, we determine "linklocal-ness" by looking at L2 and
>>bondmaster shares this with lts slaves. So it does seem fair to pass
>>those frames to the bonding-master but at the same time link-local
>>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>>etc). Your thoughts?
>
> I agree the whole thing sounds kind of weird, but I'm curious as
> to what Michal's actual use case is; he presumably has some practical
> use for this, since he noticed that the behavior changed.
>
The whole "link-local" term is a bit I don't know - at this point it
feels like too many things were thrown into single bag and it got
somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
that afaik has its own address) - I added some examples in another reply
I did at the same time as you were typing this one =)
> Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
> does that combination work rationally given that the bond might send and
> receive traffic across multiple slaves? Or does the switch side bundle
> the ports together into a single logical interface for MSTP purposes?
> On the TX side, I think the bond will likely balance all STP frames to
> just one slave.
>
The basic concept - two "main" switches with "important" machines
connected to those. One switch dies and everything keeps working. With
no unused ports and so on.
In more details:
Originally I was trying MSTP daemon (on "important" machines) which
seems quite well and completely coded, but cannot really work correctly
- as afaik you can't put port (in linux bridge conext) in different
forwarding/blocking/etc. state per-region - itow per group of vlans (or
mstpd didn't know how to do that, or it wasn't implemented - I didn't
look too deep back then, though my interest resurfaced in recent days).
So that option was out of the question. But any switch, real or not,
/must/ pass bpdu frames if it doesn't interpret them. So instead of
having active mstp participant, we have passive linux bridge that passes
the frames and the two real switches around that care of mstp, treating
the linux as a shared segment. The costs/priorities/etc. on the real
swtiches are set so one bond handles two regions, and the other bond
handles other two regions. If any of the real switches dies or is taken
down for e.g. firmware update - the bond going to the other switch
handles all four regions (failover is of course not as fast as with
active rstp/mstp participation, but works quite well none the less -
around 10s after some tuning).
We could have used RSTP for that purpose as well - but that being all or
nothing in context of per-port blocking/forwarding, would leave half of
the ports unused - and we wanted to avoid that (that's why MSTP was
created after all).
Instead of using 2 bonds (2 interfaces each) we could just use 4
interfaces directly, one per region. But two of those regions see very
little traffic, so we put more and less active regions in pairs.
> As for a resolution, presuming that Michal has some reasonable
> use case, I'm thinking along the lines of reverting the new (leave frame
> attached to slave) behavior for the general case and adding a special
> case for LLDP and friends to get the new behavior. I'd like to avoid
> adding any new options to bonding.
>
My use case aside, this will cause issues for anyone attaching bond
(instead of direct interface or veth) to a bridge and doing something
more complex with it - whether related to stp or to selectively passing
e.g. lldp using group_fwd_mask sysfs. Or having LLDP daemon (e.g.
systemd-resolvd to not look far away) told to do LLDP on bond device
(even most basic active-backup case) and remaining blind. Or anything
else that expects to see/pass those multicasts on/via bonded device
(which is just a convenient way to create virtual interface out of real
interfaces after all - ITOW shouldn't probably make any calls in this
regard).
Powered by blists - more mailing lists