lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+h21hpJQvMjdgysyrPDGMN1T-Vve8Bqide0Yw+oQjwnkaucnQ@mail.gmail.com>
Date:   Mon, 25 May 2020 23:32:49 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jiri Pirko <jiri@...nulli.us>,
        Jakub Kicinski <kuba@...nel.org>,
        Ivan Vecera <ivecera@...hat.com>,
        netdev <netdev@...r.kernel.org>,
        Horatiu Vultur <horatiu.vultur@...rochip.com>,
        "Allan W. Nielsen" <allan.nielsen@...rochip.com>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        Roopa Prabhu <roopa@...ulusnetworks.com>
Subject: Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding

On Mon, 25 May 2020 at 23:11, Ido Schimmel <idosch@...sch.org> wrote:
>
> On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@...sch.org> wrote:
> > >
> > > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@....com>
> > > >
> > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > situations where we can optimize RX filtering towards the host. To be
> > > > precise, the host only needs to do termination, which it can do by
> > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > so there is no need to see packets with unknown destination address.
> > > >
> > > > But there are, however, cases when a switchdev does need to flood to the
> > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > interface, and since there is no offloaded datapath, packets need to
> > > > pass through the CPU. Currently this is the only identified case, but it
> > > > can be extended at any time.
> > > >
> > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > cycles. Or: I can never know what else can be bridged with this
> > > > switchdev port, so I must leave host flooding enabled in any case.
> > > >
> > > > Let the bridge drive the host flooding decision, and pass it to
> > > > switchdev via the same mechanism as the external flooding flags.
> > > >
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > > > ---
> > > >  include/linux/if_bridge.h |  3 +++
> > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  net/bridge/br_switchdev.c |  4 +++-
> > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > index b3a8d3054af0..6891a432862d 100644
> > > > --- a/include/linux/if_bridge.h
> > > > +++ b/include/linux/if_bridge.h
> > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > >  #define BR_ISOLATED          BIT(16)
> > > >  #define BR_MRP_AWARE         BIT(17)
> > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > >
> > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > >
> > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > index a0e9a7937412..aae59d1e619b 100644
> > > > --- a/net/bridge/br_if.c
> > > > +++ b/net/bridge/br_if.c
> > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > >       }
> > > >  }
> > > >
> > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > +{
> > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > +                                BR_HOST_BCAST_FLOOD;
> > > > +     struct net_bridge_port *p, *q;
> > > > +
> > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > +             unsigned long flags = p->flags;
> > > > +             bool sw_bridging = false;
> > > > +             int err;
> > > > +
> > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > +                     if (p == q)
> > > > +                             continue;
> > > > +
> > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > +                             sw_bridging = true;
> > >
> > > It's not that simple. There are cases where not all bridge slaves have
> > > the same parent ID and still there is no reason to flood traffic to the
> > > CPU. VXLAN, for example.
> > >
> > > You could argue that the VXLAN device needs to have the same parent ID
> > > as the physical netdevs member in the bridge, but it will break your
> > > data path. For example, lets assume your hardware decided to flood a
> > > packet in L2. The packet will egress all the local ports, but will also
> > > perform VXLAN encapsulation. The packet continues with the IP of the
> > > remote VTEP(s) to the underlay router and then encounters a neighbour
> > > miss exception, which sends it to the CPU for resolution.
> > >
> > > Since this exception was encountered in the router the driver would mark
> > > the packet with 'offload_fwd_mark', as it already performed L2
> > > forwarding. If the VXLAN device has the same parent ID as the physical
> > > netdevs, then the Linux bridge will never let it egress, nothing will
> > > trigger neighbour resolution and the packet will be discarded.
> > >
> >
> > I wasn't going to argue that.
> > Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> > to multicast IPs should be flooded to the CPU.
> > Actually Allan's example was a bit simpler, he said that host flooding
> > can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> > we should try to define some mechanism by which virtual interfaces can
> > specify to the bridge that they don't need to see all traffic? Do you
> > have any ideas?
>
> Maybe, when a port joins a bridge, query member ports if they can
> forward traffic to it in hardware and based on the answer determine the
> flooding towards the CPU?
>

Ok, should this be a new ndo or some already existing mechanism? In
what level of detail does the bridge need to know what filters is the
virtual interface going to apply? Just binary yes/no? In that case,
could it only check for the netdev ops?

> >
> > > > +                             break;
> > > > +                     }
> > > > +             }
> > > > +
> > > > +             if (sw_bridging)
> > > > +                     flags |= mask;
> > > > +             else
> > > > +                     flags &= ~mask;
> > > > +
> > > > +             if (flags == p->flags)
> > > > +                     continue;
> > > > +
> > > > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > > > +             if (err)
> > > > +                     return err;
> > > > +
> > > > +             p->flags = flags;
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  int nbp_backup_change(struct net_bridge_port *p,
> > > >                     struct net_device *backup_dev)
> > > >  {
> > > > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> > > >               br->auto_cnt = cnt;
> > > >               br_manage_promisc(br);
> > > >       }
> > > > +     br_manage_host_flood(br);
> > > >  }
> > > >
> > > >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > > > index 015209bf44aa..360806ac7463 100644
> > > > --- a/net/bridge/br_switchdev.c
> > > > +++ b/net/bridge/br_switchdev.c
> > > > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> > > >
> > > >  /* Flags that can be offloaded to hardware */
> > > >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > > > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > > > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > > > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > > > +                               BR_HOST_BCAST_FLOOD)
> > > >
> > > >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> > > >                              unsigned long flags,
> > > > --
> > > > 2.25.1
> > > >
> >
> > Thanks,
> > -Vladimir

Thanks,
-Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ