netdev - Re: DSA: some questions regarding TX forwarding offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211005101253.sqvsvk53k34atjwt@skbuf>
Date:   Tue, 5 Oct 2021 10:12:53 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Alvin Šipraga <ALSI@...g-olufsen.dk>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Subject: Re: DSA: some questions regarding TX forwarding offload

On Tue, Oct 05, 2021 at 08:54:34AM +0000, Alvin Šipraga wrote:
> Hi,
>
> I am trying to implement TX forwarding offload for my in-progress
> rtl8365mb DSA driver. I have some questions which I could use some
> clarification on. They might be specific to my hardware, which is also
> OK, but then some advice on how to proceed would be helpful.
>
> Q1. Can the tagging driver somehow retrieve a port mask from the DSA
> switch driver in order to assemble the CPU->switch tag on xmit? Is there
> some infrastructure in place to share such data between the two drivers?

Nope. DSA does not maintain a cache of FDB entries retrieved from
hardware. So it cannot deduce the destination port mask from the {MAC DA, VLAN ID}
of the skb. The software FDB maintained by the bridge driver is all there is.
Based on the software FDB, which the bridge still looks up, an skb->dev
is selected. All that the TX forwarding offload feature is is a way to
remove software packet replication (skb_clone) for the case where the
packet should have been flooded, or multicast, by the software bridge
towards multiple skb->dev entities belonging to the same hardware domain.

To achieve the desired replication in hardware with DSA, the idea is to
look up the FDB once more, but this time let the switch do it in hardware.

I see it similar to the quote "life is like a box of chocolates, you
never know what you're going to get". Meaning that ok, you don't know
exactly from software on which egress ports your packet is going to
land, but the result shouldn't be too far off from the expectation in
any case:

(a) hardware FDB and software FDB are in sync for the given {MAC DA, VLAN ID}:
    packet will be forwarded in hardware towards the same port as it
    would have without the TX forwarding offload feature

(b) FDB entry exists in software, but not in hardware: packet will be
    sent once by the bridge, and will be flooded by the hardware towards
    all bridge ports belonging to the switch's hardware domain

(c) FDB entry exists in hardware, but not in software: packet will be
    "flooded" by the software bridge, but the switch will deliver it
    precisely. Flooding is therefore avoided.

(d) FDB entry does not exist in hardware or in software: see case (a)

> Q2. Is it expected by DSA that two isolated ports (e.g. two ports
> belonging to two separate bridges) can be members of the same VLAN
> without issue?

It depends.

If you mean to ask: "given the way in which the DSA core is structured,
what do you expect to happen?", the answer is that it won't work without leaks.

If you mean to ask: "what is the intention going forward?", the answer
is that it should be made to work, and you should employ hardware specific
mechanisms to avoid those leaks between VLAN N of br0 and VLAN N of br1,
or deny the simultaneous existence of a VLAN-aware br0 and a VLAN-aware br1.

For example, right now you should at least impose the latter restriction,
see for example sja1105_prechangeupper().

In the long term, you should get acquainted with your hardware's FDB
isolation mechanism, because there will exist an API through which DSA
will tell you "this switchdev object (FDB, MDB, VLAN) came from this
bridge, which I've associated for you with a unique integer, just so you
know when you program it to hardware, I might come back with an
identical switchdev object later but on a different port, and belonging
to a different bridge":
https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/

The most flexible FDB isolation mechanism I've seen so far is in
mv88e6xxx, you can freely associate a VID with a FID (of which there are
4K entries) and FDB lookup is performed by {FID, MAC DA}. This patch has
the details of where mv88e6xxx is right now and what can be done further:
https://patchwork.kernel.org/project/netdevbpf/patch/20211005001414.1234318-5-vladimir.oltean@nxp.com/

So with that hardware, you can have 2 VLAN-aware bridges, and both
bridges can use the full 4K VID space numerically, but in total you
cannot have more than 4K FIDs in the system, so 1000 VLANs on one bridge
and 3000 on the other, or distributions like that. Numerically, the VIDs
of one bridge can be identical to the VIDs of another as long as FIDs
are unique.

> Background: The RTL8365MB's CPU tag includes an ALLOW field followed by
> a "port mask" field. If ALLOW=1 then - based on the VLAN tag in the
> frame and the port mask - the switch will automatically replicate the
> frame and egress it on all suitable ports, but only ports which are in
> the port mask.
>
> If ALLOW=1, and if the port mask is all zeroes or all ones, then the
> switch will make its forwarding decision based only on the VLAN tag in
> the frame (if any). Now consider a configuration as follows:

When you say "based _only_ on the VLAN tag" do you mean that the MAC DA
is not taken into consideration? Are packets flooded towards the entire
set of ports in the allowance port mask that are members of VLAN N?
Do you have address learning properly set up, and can you confirm with
an FDB dump that the FDB is not in fact empty in the FID you are
injecting in (see below)?

>          br0            br1
>           +              +
>           |              |
>       +---+---+      +---+---+
>       |       |      |       |
>      swp0    swp1   swp2    swp3
>
> ... with both bridges containing switch port(s) belonging to the same
> VLAN n. How should I prevent - with TX forwarding offload - a packet
> with VID=n from being egressed on a port on the opposite bridge which
> belongs to the same VLAN n?
>
> In the above scenario, either I must refine the CPU tag "port mask"
> (hence Q1), or I must restrict the hardware configuration in some way
> (hence Q2), or I must conclude that TX forwarding offload is not
> possible with these constraints, or there is some alternative solution
> or nuance that I have not thought of.

I don't want to answer any of these questions until I understand how
does your hardware intend the FID and FID_EN bits from the DSA header to
be used. The FID only has 2 bits, so it is clear to me that it doesn't
have the same understanding of the term as mv88e6xxx, if the Realtek
switch has up to 4 FIDs while Marvell up to 4K.

You should ask yourself not only how to prevent leakage, but also the
flip side, how should I pass the packet to the switch in such a way that
it will learn its MAC SA in the right FID, assuming that you go with FDB
isolation first and figure that out. Once that question is answered, you
can in premise specify an allowance port mask which is larger than
needed (the entire mask of user ports) and the switch should only
forward it towards the ports belonging to the same FID, which are
roughly equivalent with the ports under a specific bridge. You can
create a mapping between a FID and dp->bridge_num. Makes sense or am I
completely off?