netdev - Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200723223551.d23ol2oriv474bn4@skbuf>
Date:   Fri, 24 Jul 2020 01:35:51 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jiri Pirko <jiri@...nulli.us>,
        Jakub Kicinski <kuba@...nel.org>,
        Ivan Vecera <ivecera@...hat.com>,
        netdev <netdev@...r.kernel.org>,
        Horatiu Vultur <horatiu.vultur@...rochip.com>,
        "Allan W. Nielsen" <allan.nielsen@...rochip.com>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        Roopa Prabhu <roopa@...ulusnetworks.com>
Subject: Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host
 flooding

On Mon, May 25, 2020 at 11:11:11PM +0300, Ido Schimmel wrote:
> On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> > Hi Ido,
> > 
> > On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@...sch.org> wrote:
> > >
> > > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@....com>
> > > >
> > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > situations where we can optimize RX filtering towards the host. To be
> > > > precise, the host only needs to do termination, which it can do by
> > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > so there is no need to see packets with unknown destination address.
> > > >
> > > > But there are, however, cases when a switchdev does need to flood to the
> > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > interface, and since there is no offloaded datapath, packets need to
> > > > pass through the CPU. Currently this is the only identified case, but it
> > > > can be extended at any time.
> > > >
> > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > cycles. Or: I can never know what else can be bridged with this
> > > > switchdev port, so I must leave host flooding enabled in any case.
> > > >
> > > > Let the bridge drive the host flooding decision, and pass it to
> > > > switchdev via the same mechanism as the external flooding flags.
> > > >
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > > > ---
> > > >  include/linux/if_bridge.h |  3 +++
> > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  net/bridge/br_switchdev.c |  4 +++-
> > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > index b3a8d3054af0..6891a432862d 100644
> > > > --- a/include/linux/if_bridge.h
> > > > +++ b/include/linux/if_bridge.h
> > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > >  #define BR_ISOLATED          BIT(16)
> > > >  #define BR_MRP_AWARE         BIT(17)
> > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > >
> > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > >
> > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > index a0e9a7937412..aae59d1e619b 100644
> > > > --- a/net/bridge/br_if.c
> > > > +++ b/net/bridge/br_if.c
> > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > >       }
> > > >  }
> > > >
> > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > +{
> > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > +                                BR_HOST_BCAST_FLOOD;
> > > > +     struct net_bridge_port *p, *q;
> > > > +
> > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > +             unsigned long flags = p->flags;
> > > > +             bool sw_bridging = false;
> > > > +             int err;
> > > > +
> > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > +                     if (p == q)
> > > > +                             continue;
> > > > +
> > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > +                             sw_bridging = true;
> > >
> > > It's not that simple. There are cases where not all bridge slaves have
> > > the same parent ID and still there is no reason to flood traffic to the
> > > CPU. VXLAN, for example.
> > >
> > > You could argue that the VXLAN device needs to have the same parent ID
> > > as the physical netdevs member in the bridge, but it will break your
> > > data path. For example, lets assume your hardware decided to flood a
> > > packet in L2. The packet will egress all the local ports, but will also
> > > perform VXLAN encapsulation. The packet continues with the IP of the
> > > remote VTEP(s) to the underlay router and then encounters a neighbour
> > > miss exception, which sends it to the CPU for resolution.
> > >
> > > Since this exception was encountered in the router the driver would mark
> > > the packet with 'offload_fwd_mark', as it already performed L2
> > > forwarding. If the VXLAN device has the same parent ID as the physical
> > > netdevs, then the Linux bridge will never let it egress, nothing will
> > > trigger neighbour resolution and the packet will be discarded.
> > >
> > 
> > I wasn't going to argue that.
> > Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> > to multicast IPs should be flooded to the CPU.
> > Actually Allan's example was a bit simpler, he said that host flooding
> > can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> > we should try to define some mechanism by which virtual interfaces can
> > specify to the bridge that they don't need to see all traffic? Do you
> > have any ideas?
> 
> Maybe, when a port joins a bridge, query member ports if they can
> forward traffic to it in hardware and based on the answer determine the
> flooding towards the CPU?
> 

Hi Ido, Allan,

I understand less and less of this. What I don't really understand is,
if you have a switchdev bridged with a vtep like this:

 +-------------------------+
 |           br0           |
 +-------------------------+
     |                |
     |           +--------+
     |           | vxlan0 |
     |           +--------+
     |                |
 +--------+      +--------+
 |  swp0  |      |  eth0  |
 +--------+      +--------+

why would the swp0 interface care about the remote_ip at all. To the
traffic seen by swp0, the VXLAN segment doesn't exist. Encapsulation and
decapsulation all happen outside of the switchdev interface. All that
switchdev sees is that, from the CPU side, it's talking to a bunch of
MAC addresses.

The same comment also applies for 8021q, in fact. I did try this
experiment, to bridge a switchdev with a VLAN sub-interface of another
port. I don't know why, I used to have the misconception that the desire
in doing that would be to somehow only extract one VLAN ID from the
switchdev, and the rest could be kept outside of the CPU's flooding
domain. But that isn't the case at all. When bridging, I'm bridging the
_entire_ traffic of swp0 with, say, eth0.100. And, as in the case of
vxlan, encap/decap all happens outside of switchdev. So, contrary to my
initial expectation, if I'm receiving on swp0 a packet tagged with VLAN
100, it would end up exiting the bridge, on eth0, with 2 VLAN tags with
ID 100.

Simply put, I think my change is fine the way it is. Either that, or I
just don't understand your comment about querying bridge members whether
they can forward in hardware. How are you dealing with this today?

Thanks,
-Vladimir