lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Jul 2018 00:14:40 +0200
From:   Michal Soltys <soltys@....info>
To:     Mahesh Bandewar (महेश बंडेवार) <maheshb@...gle.com>,
        Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:     Chonggang Li <chonggangli@...gle.com>,
        linux-netdev <netdev@...r.kernel.org>
Subject: Re: [BUG] bonded interfaces drop bpdu (stp) frames

On 2018-07-12 23:26, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
> <jay.vosburgh@...onical.com> wrote:
>> Michal Soltys <soltys@....info> wrote:
>>
>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>>>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>>>
>>>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys <soltys@....info> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> As weird as that sounds, this is what I observed today after bumping
>>>>>> kernel version. I have a setup where 2 bonds are attached to linux
>>>>>> bridge and physically are connected to two switches doing MSTP (and
>>>>>> linux bridge is just passing them).
>>>>>>
>>>>>> Initially I suspected some changes related to bridge code - but quick
>>>>>> peek at the code showed nothing suspicious - and the part of it that
>>>>>> explicitly passes stp frames if stp is not enabled has seen little
>>>>>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>>>>>> regular non-bonded interfaces are attached everything works fine.
>>>>>>
>>>>>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>>>>>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>>>>>> there (with them being present just fine on active enslaved interface,
>>>>>> or on the bond device in earlier kernels).
>>>>>>
>>>>>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>>>>>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>>>>>> debian) and 4.17.3 (tested on archlinux) are failing.
>>>>>>
>>>>>> Unless this is already a known issue (or you have any suggestions what
>>>>>> could be responsible).
>>>>>>
>>>>> I believe these are link-local-multicast messages and sometime back a
>>>>> change went into to not pass those frames to the bonding master. This
>>>>> could be the side effect of that.
>>>>
>>>>      Mahesh, I suspect you're thinking of:
>>>>
>>>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>>>> Author: Chonggang Li <chonggangli@...gle.com>
>>>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>>>
>>>>      bonding: deliver link-local packets with skb->dev set to link that packets arrived on
>>>>
>>>>      Michal, are you able to revert this patch and test?
>>>>
>>>>      -J
>>>>
>>>> ---
>>>>      -Jay Vosburgh, jay.vosburgh@...onical.com
>>>>
>>>
>>>
>>>Just tested - yes, reverting that patch solves the issues.
>>
>>         Chonggang,
>>
>>         Reading the changelog in your commit referenced above, I'm not
>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>
>>         As the patch appears to cause a regression, it needs to be
>> either fixed or reverted.
>>
>>         Mahesh, you signed-off on it as well, perhaps you also have some
>> context?
>>
> 
> I think the original idea behind it was to pass the LLDPDUs to the
> stack on the interface that they came on since this is considered to
> be link-local traffic and passing to bond-master would loose it's
> "linklocal-ness". This is true for LLDP and if you change the skb->dev
> of the packet, then you don't know which slave link it came on in
> (from LLDP consumer's perspective).
> 
> I don't know much about STP but trunking two links and aggregating
> this link info through bond-master seems wrong. Just like LLDP, you
> are losing info specific to a link and the decision derived from that
> info could be wrong.
> 
> Having said that, we determine "linklocal-ness" by looking at L2 and
> bondmaster shares this with lts slaves. So it does seem fair to pass
> those frames to the bonding-master but at the same time link-local
> traffic is supposed to be limited to the physical link (LLDP/STP/LACP
> etc). Your thoughts?
> 

But, isn't bond de-facto considered the "physical link" ? Not directly
of course, but say an LLDP daemon would likely be more interested in
getting LLDP data from a bond device (or a bridge device, if the bond is
attached to one), than from its enslaved interfaces (and enslaved
interfaces can be changed, not mentioning potentially complex setup
itself, even if usually it's just lacp&go ).

ITOW, blocking link-local multicasts on bond level (among those - bpdu,
pae, lldp) is a bit like if the interface itself hid LACP before bond code.

A few other examples:

- putting bonds in a bridge is pretty normal thing - and whether the
bridge interpretes the spanning tree data itself (via in-kernel classic
stp or userspace daemon for e.g. rstp) or passes the trafic, it must see
the BPDU frames. Otherwise it becomes blind to the whole spanning tree
protocol - and implicitly other switches around - real or virtual ones.
It's literally instant loop disaster. br_input.c specifically takes care
to pass those frames if the bridge has stp turned off

- "group_fwd_mask" (again in bridge context) has been added to bridge
code - and recently as a per-port knob as well - to specifically allow
the control of what kind of "link-local" stuff is passed or not. LLDP
and 802.1X PAE were, afaik, the main reasons for that sysfs variable.
The per-port setting is even more relaxed (iirc, only pause frames are
not passable)

- LLDP daemon example - as above

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ