netdev - Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190731080149.oyqcrw42utxjizsx@lx-anielsen.microsemi.net>
Date:   Wed, 31 Jul 2019 10:01:50 +0200
From:   "Allan W. Nielsen" <allan.nielsen@...rochip.com>
To:     Andrew Lunn <andrew@...n.ch>
CC:     Ido Schimmel <idosch@...sch.org>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        Horatiu Vultur <horatiu.vultur@...rochip.com>,
        <roopa@...ulusnetworks.com>, <davem@...emloft.net>,
        <bridge@...ts.linux-foundation.org>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

The 07/31/2019 05:31, Andrew Lunn wrote:
> The PDF you linked to suggests this as well. But i think you will need
> to make some core changes to the bridge. At the moment, STP is a
> bridge level property. But you are going to need it to be a per-port
> option. You can then use DLR on the ring ports, and optionally STP on
> the other ports.
I think you are right. I have not looked into the details of this. Our plan was
to not implement this to begin with. This will mean that it cannot act as a
gateway until this is done. But it still has good value to be able to implement
a DLR ring node.

> > But what we are looking at here, is to offload a
> > non-aware-(DLR|MRP)-switch which happens to be placed in a network
> > with these protocols running.
> 
> So we need to think about why we are passing traffic to the CPU port,
> and under what conditions can it be blocked.
> 
> 1) The interface is not part of a bridge. In this case, we only need
> the switch to pass to the CPU port MC addresses which have been set
> via set_rx_mode().
Yes. In our HW we are using the MAC table to do this, but this is an
implementation detail.

> I think this case does not apply for what you want. You have two ports
> bridges together as part of the ring.
Correct.

> 2) The interface is part of a bridge. There are a few sub-cases
> 
> a) IGMP snooping is being performed. We can block multicast where
> there is no interest in the group. But this is limited to IP
> multicast.
Agree. And this is done today by installing an explicit offload rule to limit
the flooding of a specific group.

> b) IGMP snooping is not being used and all interfaces in the bridge
> are ports of the switch. IP Multicast can be blocked to the CPU.
Does it matter if IGMP snooping is used or not? A more general statement could
be:

- "All interfaces in the bridge are ports of the switch. IP Multicast can be
  blocked to the CPU."

This assumes that if br0 joins a IP multicast group, then the needed MAC
addresses is added via the set_rx_mode (which I'm pretty sure is the case.

Or much more aggressive (which is where we started this entire discussion):

- "All interfaces in the bridge are ports of the switch. All Multicast (except
  ff:ff:ff:ff:ff:ff, 33:33:xx:xx:xx:xx, 01:80:C2:xx:xx:xx) can be blocked to the
  CPU."

But then we are back to having the need of requesting additional L2-multicast
mac addresses to the CPU. This has been discussed, and I do not want to re-start
the discussion, just wanted to mention it to have the complete picture.

> c) IGMP snooping is not being used and there is a non-switch interface
> in the bridge. Multicast needed is needed, so it can be flooded out
> this port.
Yes. And L2-multicast also always needs to go to the CPU regardless of IGMP.

> d) set_rx_mode() has been called on the br0 interface, indicating
> there is interest in the packets on the host. They must be sent to the
> CPU so they can be delivered locally.
Yes - but today set_rx_mode is not doing anything on br0. This was one problem
we had a few days ago, when we did not forward all traffic to the CPU by
default.

When looking at these cases, I'm not sure it matters if IGMP snooping is running
or not.

Here is how I understand what you say:

              || Foreign interfaces   |  No Foreign interfaces
==============||======================|=========================
IGMP Snoop on || IP-Multicast must    |  IP-Multicast is not needed
              || go to CPU by default |  in the CPU (by default*)
--------------||----------------------|-------------------------
IGMP Snoop off|| IP-Multicast must    |  IP-Multicast is not needed 
              || go to CPU by default |  in the CPU (by default*)

* This assumes that set_rx_mode starts to do something for the br0 interface
  (which does not seem to be the case right now - see br_dev_set_multicast_list).


> e) ????
e) Another case to consider is similar to d), but with VLANs. Assume that
br0.100 joins a IP-Multicast group, which will cause set_rx_mode() to be called.
As I read the code (I have not tried it, so I might have gotten this wrong),
then we loose the VLAN information, because the mc list is not VLAN aware.

This means that if br0.100 joins a group, then we will get that group for all
VLANs.

Maybe it is better (for now) to hold on to the exiting behavioral and let all
multicast go to the CPU by default, leave br_dev_set_multicast_list empty, use
IGMP snooping to optimize the flooding, and hopefully we will have a way to
limit the L2-multicast flooding as an optimization which can be applied to
"noisy" l2 multicast streams.

> Does the Multicast MAC address being used by DLR also map to an IP
> mmulticast address?
No.

> 01:21:6C:00:00:0[123] appear to be the MAC addresses used by DLR.
Yes.

> IPv4 multicast MAC addresses are 01:00:5E:XX:XX:XX. IPv6 multicast MAC
> addresses are 33:33:XX:XX:XX:XX.
Yes, and there are no overlap.

> So one possibility here is to teach the SW bridge about non-IP
> multicast addresses. Initially the switch should forward all MAC
> multicast frames to the CPU. If the frame is not an IPv4 or IPv6
> frame, and there has not been a call to set_rx_mode() for the MAC
> address on the br0 interface, and the bridge only contains switch
> ports, switchdev could be used to block the multicast to the CPU
> frame, but forward it out all other ports of the bridge.
Close, but not exactly (due to the arguments above).

Here is how I see it:

Teach the SW bridge about non-IP multicast addresses. Initially the switch
should forward all MAC multicast frames to the CPU. Today MDB rules can be
installed (either static or dynamic by IGMP), which limit the flooding of IPv4/6
multicast streams. In the same way, we should have a way to install a rule
(FDM/ or MDB) to limit the flooding of L2 multicast frames.

If foreign interfaces (or br0 it self) is part of the destination list, then
traffic also needs to go to the CPU.

By doing this, we can for explicitly configured dst mac address:
- limit the flooding on the on the SW bridge interfaces
- limit the flooding on the on the HW bridge interfaces
- prevent them to go to the CPU if they are not needed


-- 
/Allan