netdev - Re: net: dsa: lantiq_gswip: getting the first selftests to pass

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFBinCD5yQodsv0PoqnqVA6F7uMkoD-_mUxi54uAHc9Am7V-VQ@mail.gmail.com>
Date:   Fri, 29 Jul 2022 00:09:50 +0200
From:   Martin Blumenstingl <martin.blumenstingl@...glemail.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     netdev@...r.kernel.org, andrew@...n.ch, vivien.didelot@...il.com,
        Hauke Mehrtens <hauke@...ke-m.de>, f.fainelli@...il.com,
        Aleksander Jan Bajkowski <olek2@...pl>
Subject: Re: net: dsa: lantiq_gswip: getting the first selftests to pass

Hi Vladimir,

On Thu, Jul 28, 2022 at 12:44 AM Vladimir Oltean <olteanv@...il.com> wrote:
[...]
> > I am starting with
> > tools/testing/selftests/drivers/net/dsa/local_termination.sh as my
> > understanding is that this contains the most basic tests and should be
> > the first step.
>
> I don't think I've said local_termination.sh contains the most basic
> tests. Some tests are basic, but not all are.
noted, thanks for clarifying this

[...]
> So the absolute first step would be to control the bridge port flags
> (BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD) and start
> with good defaults for standalone mode (also set skb->offload_fwd_mark
> when appropriate in the tagging protocol driver). I think you can use
> bridge_vlan_aware.sh and bridge_vlan_unaware.sh as starting points to
> check that these flags still work fine after you've offloaded them to
> hardware.
My understanding of "good defaults" is:
- disable learning on all ports
- disable unicast flooding on all ports
- enable broadcast flooding on all ports
- (GSWIP can only enable broadcast and multicast at the same time, so
that's enabled too)

I think skb->offload_fwd_mark needs to be set unless we know that the
hardware wasn't able to forward the frame/packet.
In the vendor sources I was able to find the whole RX tag structure: [0]
I am not sure about the "mirror" bit (I assume this is: packet was
received on this port because this port is configured as a mirroring
target). All other bits seem irrelevant for skb->offload_fwd_mark -
meaning we always have to set skb->offload_fwd_mark.

I have lots of failures in bridge_vlan_aware.sh and
bridge_vlan_unaware.sh - even before any of my changes - which I'll
need to investigate.

> When flooding a packet to find its destination can be achieved without
> involving the CPU (*), the next thing will be to simply disable flooding
> packets of all kind to the CPU (except broadcast). That's when you'll
> enjoy watching how all the local_termination.sh selftests fail, and
> you'll be making them pass again, one by one.
it's called test driven development :-)

> >
> > Full local_termination.sh selftest output:
> > TEST: lan2: Unicast IPv4 to primary MAC address                 [ OK ]
>
> For this to pass, the driver must properly respond to a port_fdb_add()
> on the CPU port, with the MAC address of the $swp1 user port's net device,
> offloaded in the DSA_DB_PORT corresponding to $swp1.
>
> In turn, for DSA to even consider passing you FDB entries in DSA_DB_PORT,
> you must make dsa_switch_supports_uc_filtering() return true.
>
> (if you don't know what the words here mean, I've updated the documentation at
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/Documentation/networking/dsa/dsa.rst)
Three out of four cases for the FDB isolation code are clear to me:
- the DSA_DB_BRIDGE case is easy as this is basically what we had
implemented before and I "just" need to look up the FID based on
db.bridge.dev
- DSA_DB_PORT for a non-CPU/user port: we should use the same FID as
standalone ("single port bridge") ports use (that's port number + 1;
see gswip_add_single_port_br())
- GSWIP doesn't support DSA_DB_LAG currently, so I can handle this
with -EOPNOTSUPP

One case took me a while to figure out but I think I finally
understood it (and now it's clear why the FDB patch I sent earlier
cannot be upstreamed):
- DSA_DB_PORT for the CPU port: the port argument for port_fdb_add is
the CPU port - but we can't map this to a FID (those are always tied
to either a bridge or a user port). So instead I need to look at db.dp
and for example use it's index for getting the FID (for standalone
ports the FID is: port index + 1). That results in: we're requested to
install the CPU ports MAC address on the CPU port (6), but what we
actually do is install the FDB entry with the CPU port's MAC address
on a user port (let's say 4, which we get from db.dp). Now if a
packet/frame should target the CPU port we don't need flooding because
the switch knows the destination port based on the FDB entry we
installed.

Note to myself: I'll also need to look at
assisted_learning_on_cpu_port when I'm at the point where I can do
optimizations (rather than fixing failing tests).

[...]
> Nobody will call port_fdb_add() for the address tested here.
>
> > TEST: lan2: Unicast IPv4 to unknown MAC address, promisc        [ OK ]
>
> Now this passes because the expectation of promiscuous ports is to
> receive all packets regardless of MAC DA, that's the definition of
> promiscuity. The driver currently already floods to the CPU, so why
> wouldn't this pass.
>
> Here, what we actually want to capture is that dsa_slave_manage_host_flood(),
> which responds to changes in the IFF_PROMISC flag on a user port, does
> actually notify the driver via a call to port_set_host_flood() for that
> user port. Through this method, the driver is responsible for turning
> flooding towards the CPU port(s) on or off, from the user port given as
> argument. If CPU flood control does not depend on user port, then you'll
> have to keep CPU flooding enabled as long as any user port wants it.
Thanks for this hint. Indeed, I need to implement it like you did in
felix_port_set_host_flood() because GSWIP can only enable/disable
flooding to the CPU port globally - it can't configure this based on
the source port.

[...]
> > TEST: lan2: Multicast IPv6 to unknown group                     [FAIL]
> >         reception succeeded, but should have failed
> > TEST: lan2: Multicast IPv6 to unknown group, promisc            [ OK ]
> > TEST: lan2: Multicast IPv6 to unknown group, allmulti           [ OK ]
> > TEST: br0: Unicast IPv4 to primary MAC address                  [ OK ]
>
> Here is where things get interesting. I'm going to take a pause and
> explain that the bridge related selftests fail in the same way for me
> too, and that the fixes should go to the bridge driver rather than to
> DSA.
oh, that's good to know - thanks for sharing this info!

[...]
> > TEST: br0: Unicast IPv4 to unknown MAC address                  [FAIL]
> >         reception succeeded, but should have failed
> > TEST: br0: Unicast IPv4 to unknown MAC address, promisc         [ OK ]
> > TEST: br0: Unicast IPv4 to unknown MAC address, allmulti        [FAIL]
> >         reception succeeded, but should have failed
> > TEST: br0: Multicast IPv4 to joined group                       [ OK ]
>
> Just one more small comment to make.
> The addresses in the "br0" tests are also notified through port_mdb_add(),
> but they use DSA_DB_BRIDGE since they come to DSA via switchdev rather
> than via dev_mc_add() - they came _to_the_bridge_ via dev_mc_add().
> DSA drivers are expect to offload these multicast entries to a different
> database than the port_mdb_add() I've described above. This is a
> database that is active only while $swp1 is part of a bridge, while
> DSA_DB_PORT is active only while $swp1 is standalone.
The vendor driver lists a bunch of possible tables: [1]
I'll leave any effort for multicast out for now... it's not the main
use-case I am looking at right now.

As always: thank you for your amazing explanations, hints and pointers!
Also I would like to point out that I am still doing all of this in my
spare time. Whenever you have other conversations to focus on: please
do so! For me it's not critical if you're getting back to me a few
days sooner or later.


Best regards,
Martin


[0] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/patches-3.10/7000-NET-lantiq-adds-eth-drv.patch#L2238-L2259
[1] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/files/drivers/net/ethernet/lantiq/switch-api/ltq_flow_core.h#L164-L192