netdev - Re: [PATCH net-next 00/10] DSA unicast filtering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220302220520.z34hk3nsn5njw2z4@skbuf>
Date:   Wed, 2 Mar 2022 22:05:21 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Florian Fainelli <f.fainelli@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vladimir Oltean <olteanv@...il.com>,
        Ido Schimmel <idosch@...dia.com>,
        Tobias Waldekranz <tobias@...dekranz.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        "UNGLinuxDriver@...rochip.com" <UNGLinuxDriver@...rochip.com>
Subject: Re: [PATCH net-next 00/10] DSA unicast filtering

Hi Florian,

On Wed, Mar 02, 2022 at 11:30:41AM -0800, Florian Fainelli wrote:
> Hi Vladimir,
> 
> On 3/2/2022 11:14 AM, Vladimir Oltean wrote:
> > This series doesn't attempt anything extremely brave, it just changes
> > the way in which standalone ports which support FDB isolation work.
> > 
> > Up until now, DSA has recommended that switch drivers configure
> > standalone ports in a separate VID/FID with learning disabled, and with
> > the CPU port as the only destination, reached trivially via flooding.
> > That works, except that standalone ports will deliver all packets to the
> > CPU. We can leverage the hardware FDB as a MAC DA filter, and disable
> > flooding towards the CPU port, to force the dropping of packets with
> > unknown MAC DA.
> > 
> > We handle port promiscuity by re-enabling flooding towards the CPU port.
> > This is relevant because the bridge puts its automatic (learning +
> > flooding) ports in promiscuous mode, and this makes some things work
> > automagically, like for example bridging with a foreign interface.
> > We don't delve yet into the territory of managing CPU flooding more
> > aggressively while under a bridge.
> > 
> > The only switch driver that benefits from this work right now is the
> > NXP LS1028A switch (felix). The others need to implement FDB isolation
> > first, before DSA is going to install entries to the port's standalone
> > database. Otherwise, these entries might collide with bridge FDB/MDB
> > entries.
> > 
> > This work was done mainly to have all the required features in place
> > before somebody starts seriously architecting DSA support for multiple
> > CPU ports. Otherwise it is much more difficult to bolt these features on
> > top of multiple CPU ports.
> 
> Thanks a lot for submitting this, really happy to see a solution being
> brought upstream. I will be reviewing this in more details later on, but
> from where I left a few years ago, the two challenges that I had are
> outlined below, and I believe we have not quite addressed them yet:
> 
> - for switches that implement global VLAN filtering, upper VLAN interfaces
> on top of standalone ports would require programming FDB and MDB entries
> with the appropriate VLAN ID, however there is no such tracking today
> AFAICT, so we are not yet solving those use cases yet, right?

Yes, here we're entering "brave" territory. With this patch set we have:

static inline bool dsa_switch_supports_uc_filtering(struct dsa_switch *ds)
{
	return ds->ops->port_fdb_add && ds->ops->port_fdb_del &&
	       ds->fdb_isolation && !ds->vlan_filtering_is_global &&
	       !ds->needs_standalone_vlan_filtering;
}

so yes, the use cases you mention are disqualified. To remove that
restriction, every time our .ndo_vlan_rx_add_vid is called, we'd need to
walk over our address lists (kept by struct net_device :: uc and mc),
and replay them in the newly added VLAN. This is the easy part; the
harder part is that we'd need to keep one more list of standalone VLANs
added on a port, in order to do the reverse - every time dsa_slave_sync_uc()
is called, we program that address not only in VID 0, but also in all
VIDs from the list of standalone VLANs on that port. It might be
possible to use vlan_for_each() for this. In any case it is incremental
complexity which I've not yet added.

This solution would install more host FDB entries than needed - the
Cartesian product of RX VLANs and secondary MAC addresses. I'm ok with
that, but there's also Ivan Khoronzhuk's patch series for Independent
Virtual Device Filtering (IVDF), which basically annotates each MAC
address from struct net_device with a VID. That one really seems
excessive, at this stage, to me.

> - what if the switch does not support FDB/MDB isolation, what would be our
> options here? As you might remember from a few months ago, the Broadcom
> roboswitch do not have any isolation, but what they can do is internally tag
> Ethernet frames with two VLAN tags, an that may be used as a form of
> isolation

The central thing I had in mind is that DSA doesn't really enforce that
packets are sent as 'control traffic'. The users of tag_8021q are the
closest example to me - routing towards the correct egress port is done
through a VID added to the packet, later stripped. The essential
characteristic of a packet sent by a DSA tagger as a 'data packet' is
that the FDB lookup is unavoidable.

The reason why I added FDB isolation as a requirement is this:

   +--------------------+
   |                    |
   | +------+ +------+  |
   | | swp0 | | swp1 |  |
   +-+------+-+------+--+
        |        |
        +--------+

swp0 has 192.168.100.1/24, swp3 has 192.168.100.2/24, they are in
different net namespaces or vrfs and want to ping each other.

Without FDB isolation, the mantra is that DSA will keep the FDB devoid
of swp0's and swp1's MAC addresses. Flooding is enabled on all ports and
the CPU port, and learning is disabled.

With FDB isolation and this patch series, DSA will install the MAC
address of swp0 on the CPU port in swp0's private database, and the
address of swp1 on the CPU port in swp1's private database. It is the
driver's responsibility to then distinguish between those 2 databases
the best it can. VID, FID, EFID, you name it.

When swp0 sends a packet to swp1, it has swp0's MAC SA and swp1's MAC DA.
If DSA doesn't know in the general sense that there will be no FDB lookup,
then the best it can do is to ensure that swp0's and swp1's address are
at least not in the same database. Because otherwise, the CPU port will
see a packet with swp1's MAC DA, but the destination for that address is
the CPU port => drop.

Sadly I don't know if this scenario is hypothetical or not.
With sja1105, it wasn't until I converted it to isolate the FDBs
(packets are sent as 'data traffic', but every standalone port has a
different pvid, and packets sent from the CPU follow the egress port's
pvid). If there are more which I haven't caught (there are many DSA
drivers which I don't know very well), this would break them.
So I wanted to be on the conservative side, but I suppose I could be
convinced otherwise.

Please note that there is hope. On ocelot, standalone ports share the
same pvid of 0, and yet it still declares FDB isolation. This is because
traffic sent over the NPI port (using the "ocelot" tagger) bypasses the
FDB lookup and it will always reach its intended destination. And using
the "ocelot-8021q" tagger, packets hit a VCAP IS2 TCAM rule on the
ingress of the CPU port that redirects them to the desired egress port,
before the packet processing reaches the VLAN table.

As for the Roboswitch discussion:
https://lore.kernel.org/all/04d700b6-a4f7-b108-e711-d400ef01cc2e@bang-olufsen.dk/T/#mb2c185003d8a58c691918e056cf09fd0ef6d1bd0

that was mainly about TX forwarding offload, i.o.w. "how to use the FDB
lookup of data plane packets to your benefit, instead of against you".
In this context of FDB isolation I think it would be more beneficial to
you to keep sending control packets, if those bypass the FDB.

It is the bridge TX forwarding offload that I don't see how you could do.

My takeaway from that discussion is that the IMP port processes the VID
from the packet if VLAN awareness is turned on globally, and probably
reverts to a single default VID otherwise.

So even though you could arrange for each VLAN-unaware bridge to have a
driver-private port pvid, and your tagger could wrap packets in an outer
VLAN tag with this value, the fundamental problem seems to be that VLAN
awareness is global in the first place. So either your user ports can
offload a VLAN-unaware bridge, case in which the IMP can't see the VID
from the packets, or the IMP can see the VID from the packets, but you
can't offload a VLAN-unaware bridge.

In this case and if I have understood all the information, I think what
you can do is deny multiple bridges (regardless of VLAN awareness),
configure standalone ports to use pvid 0 or 4095 (and different from the
pvid of VLAN-unaware bridge ports) and just declare that you support FDB
isolation.