lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 22 Aug 2021 10:19:14 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Roopa Prabhu <roopa@...dia.com>,
        Nikolay Aleksandrov <nikolay@...dia.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vadym Kochan <vkochan@...vell.com>,
        Taras Chornyi <tchornyi@...vell.com>,
        Jiri Pirko <jiri@...dia.com>, Ido Schimmel <idosch@...dia.com>,
        UNGLinuxDriver@...rochip.com,
        Grygorii Strashko <grygorii.strashko@...com>,
        Marek Behun <kabel@...ckhole.sk>,
        DENG Qingfang <dqfext@...il.com>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Hauke Mehrtens <hauke@...ke-m.de>,
        Woojung Huh <woojung.huh@...rochip.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Landen Chao <Landen.Chao@...iatek.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        George McCollister <george.mccollister@...il.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Lars Povlsen <lars.povlsen@...rochip.com>,
        Steen Hegelund <Steen.Hegelund@...rochip.com>,
        Julian Wiedmann <jwi@...ux.ibm.com>,
        Karsten Graul <kgraul@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Ivan Vecera <ivecera@...hat.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Jianbo Liu <jianbol@...dia.com>,
        Mark Bloch <mbloch@...dia.com>, Roi Dayan <roid@...dia.com>,
        Tobias Waldekranz <tobias@...dekranz.com>,
        Vignesh Raghavendra <vigneshr@...com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
 blocking

On Sat, Aug 21, 2021 at 10:09:14PM +0300, Vladimir Oltean wrote:
> On Fri, Aug 20, 2021 at 07:11:15PM +0300, Ido Schimmel wrote:
> > On Fri, Aug 20, 2021 at 01:49:48PM +0300, Vladimir Oltean wrote:
> > > On Fri, Aug 20, 2021 at 12:16:10PM +0300, Ido Schimmel wrote:
> > > > On Thu, Aug 19, 2021 at 07:07:18PM +0300, Vladimir Oltean wrote:
> > > > > Problem statement:
> > > > >
> > > > > Any time a driver needs to create a private association between a bridge
> > > > > upper interface and use that association within its
> > > > > SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handler, we have an issue with FDB
> > > > > entries deleted by the bridge when the port leaves. The issue is that
> > > > > all switchdev drivers schedule a work item to have sleepable context,
> > > > > and that work item can be actually scheduled after the port has left the
> > > > > bridge, which means the association might have already been broken by
> > > > > the time the scheduled FDB work item attempts to use it.
> > > >
> > > > This is handled in mlxsw by telling the device to flush the FDB entries
> > > > pointing to the {port, FID} when the VLAN is deleted (synchronously).
> > > 
> > > If you have FDB entries pointing to bridge ports that are foreign
> > > interfaces and you offload them, do you catch the VLAN deletion on the
> > > foreign port and flush your entries towards it at that time?
> > 
> > Yes, that's how VXLAN offload works. VLAN addition is used to determine
> > the mapping between VNI and VLAN.
> 
> I was only able to follow as far as:
> 
> mlxsw_sp_switchdev_blocking_event
> -> mlxsw_sp_switchdev_handle_vxlan_obj_del
>    -> mlxsw_sp_switchdev_vxlan_vlans_del
>       -> mlxsw_sp_switchdev_vxlan_vlan_del
>          -> ??? where are the FDB entries flushed?

 mlxsw_sp_switchdev_blocking_event
 -> mlxsw_sp_switchdev_handle_vxlan_obj_del
    -> mlxsw_sp_switchdev_vxlan_vlans_del
       -> mlxsw_sp_switchdev_vxlan_vlan_del
          -> mlxsw_sp_bridge_vxlan_leave
	     -> mlxsw_sp_nve_fid_disable
	        -> mlxsw_sp_nve_fdb_flush_by_fid

> 
> I was expecting to see something along the lines of
> 
> mlxsw_sp_switchdev_blocking_event
> -> mlxsw_sp_port_vlans_del
>    -> mlxsw_sp_bridge_port_vlan_del
>       -> mlxsw_sp_port_vlan_bridge_leave
>          -> mlxsw_sp_bridge_port_fdb_flush
> 
> but that is exactly on the other branch of the "if (netif_is_vxlan(dev))"
> condition (and also, mlxsw_sp_bridge_port_fdb_flush flushes an externally-facing
> port, not really what I needed to know, see below).
> 
> Anyway, it also seems to me that we are referring to slightly different
> things by "foreign" interfaces. To me, a "foreign" interface is one
> towards which there is no hardware data path. Like for example if you
> have a mlxsw port in a plain L2 bridge with an Intel card. The data path
> is the CPU and that was my question: do you track FDB entries towards
> those interfaces (implicitly: towards the CPU)? You've answered about
> VXLAN, which is quite not "foreign" in the sense I am thinking about,
> because mlxsw does have a hardware data path towards a VXLAN interface
> (as you've mentioned, it associates a VID with each VNI).
> 
> I've been searching through the mlxsw driver and I don't see that this
> is being done, so I'm guessing you might wonder/ask why you would want
> to do that in the first place. If you bridge a mlxsw port with an Intel
> card, then (from another thread where you've said that mlxsw always
> injects control packets where hardware learning is not performed) my
> guess is that the MAC addresses learned on the Intel bridge port will
> never be learned on the mlxsw device. So every packet that ingresses the
> mlxsw and must egress the Intel card will reach the CPU through flooding
> (and will consequently be flooded in the entire broadcast domain of the
> mlxsw side of the bridge). Right?

I can see how this use case makes sense on systems where the difference
in performance between the ASIC and the CPU is not huge, but it doesn't
make much sense with Spectrum and I have yet to get requests to support
it (might change). Keep in mind that Spectrum is able to forward several
Bpps with a switching capacity of several Tbps. It is usually connected
to a weak CPU (e.g., low-end ARM, Intel Atom) through a PCI bus with a
bandwidth of several Gbps. There is usually one "Intel card" on such
systems which is connected to the management network that is separated
from the data plane network.

If we were to support it, FDB entries towards "foreign" interfaces would
be programmed to trap packets to the CPU. For now, for correctness /
rigor purposes, I would prefer simply returning an error / warning via
extack when such topologies are configured.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ