netdev - Re: net: dsa: lantiq_gswip: getting the first selftests to pass

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220727224409.jhdw3hqfta4eg4pi@skbuf>
Date:   Thu, 28 Jul 2022 01:44:09 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Martin Blumenstingl <martin.blumenstingl@...glemail.com>
Cc:     netdev@...r.kernel.org, andrew@...n.ch, vivien.didelot@...il.com,
        Hauke Mehrtens <hauke@...ke-m.de>, f.fainelli@...il.com,
        Aleksander Jan Bajkowski <olek2@...pl>
Subject: Re: net: dsa: lantiq_gswip: getting the first selftests to pass

Hi Martin,

On Wed, Jul 27, 2022 at 10:36:55PM +0200, Martin Blumenstingl wrote:
> Hello,
> 
> there are some pending issues with the Lantiq GSWIP driver.
> Vladimir suggested to get the kernel selftests to pass in a first step.
> I am starting with
> tools/testing/selftests/drivers/net/dsa/local_termination.sh as my
> understanding is that this contains the most basic tests and should be
> the first step.

I don't think I've said local_termination.sh contains the most basic
tests. Some tests are basic, but not all are.

> The good news is that not all tests are broken!

Looking at the tests which fail, I think the gswip driver is in a pretty
good state functionally speaking.

> There are eight tests which are not passing. Those eight can be split
> into two groups of four, because it's the same four tests that are
> failing for "standalone" and "bridge" interfaces:
> - Unicast IPv4 to unknown MAC address
> - Unicast IPv4 to unknown MAC address, allmulti
> - Multicast IPv4 to unknown group
> - Multicast IPv6 to unknown group
> 
> What they all have in common is the fact that we're expecting that no
> packets are received. But in reality packets are received. I manually
> confirmed this by examining the tcpdump file which is generated by the
> selftests.
> 
> Vladimir suggested in [0]:
> > [...] we'll need to make smaller steps, like disable address
> > learning on standalone ports, isolate FDBs, maybe offload the bridge TX
> > forwarding process (in order to populate the "Force no learning" bit in
> > tag_gswip.c properly), and only then will the local_termination test
> > also pass [...]
> 
> Based on the failing tests I am wondering which step would be a good
> one to start with.
> Is this problem that the selftests are seeing a flooding issue? In
> that case I suspect that the "interesting behavior" (of the GSWIP's
> flooding behavior) that Vladimir described in [1] would be a starting
> point.

It has to do with that, yes. What I said there is that the switch
doesn't autonomously flood unknown packets from one bridged port to
another, but instead, sends them to the CPU and lets the CPU do it.

While that is perfectly respectable from a correctness point of view,
it is also not optimal if you consider performance. The selftests here
try to capture the fact that the switch doesn't send unknown packets to
the CPU. And in this case the driver sends them by construction.

So the absolute first step would be to control the bridge port flags
(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD) and start
with good defaults for standalone mode (also set skb->offload_fwd_mark
when appropriate in the tagging protocol driver). I think you can use
bridge_vlan_aware.sh and bridge_vlan_unaware.sh as starting points to
check that these flags still work fine after you've offloaded them to
hardware.

When flooding a packet to find its destination can be achieved without
involving the CPU (*), the next thing will be to simply disable flooding
packets of all kind to the CPU (except broadcast). That's when you'll
enjoy watching how all the local_termination.sh selftests fail, and
you'll be making them pass again, one by one.

> 
> Full local_termination.sh selftest output:
> TEST: lan2: Unicast IPv4 to primary MAC address                 [ OK ]

For this to pass, the driver must properly respond to a port_fdb_add()
on the CPU port, with the MAC address of the $swp1 user port's net device,
offloaded in the DSA_DB_PORT corresponding to $swp1.

In turn, for DSA to even consider passing you FDB entries in DSA_DB_PORT,
you must make dsa_switch_supports_uc_filtering() return true.

(if you don't know what the words here mean, I've updated the documentation at
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/Documentation/networking/dsa/dsa.rst)

> TEST: lan2: Unicast IPv4 to macvlan MAC address                 [ OK ]

If the above passes, this should pass too. There's nothing different
from the driver's perspective, it's just that port_fdb_add() will be
called on the CPU port with a MAC address that isn't the MAC address of
any user port, but rather one added by the macvlan driver using dev_uc_add().

> TEST: lan2: Unicast IPv4 to unknown MAC address                 [FAIL]
>         reception succeeded, but should have failed

So this test should now pass after you've reworked the entire driver,
essentially. The point is that when flooding is disabled on the CPU port,
packets will be sent there only if they match an entry from the FDB.
Nobody will call port_fdb_add() for the address tested here.

> TEST: lan2: Unicast IPv4 to unknown MAC address, promisc        [ OK ]

Now this passes because the expectation of promiscuous ports is to
receive all packets regardless of MAC DA, that's the definition of
promiscuity. The driver currently already floods to the CPU, so why
wouldn't this pass.

Here, what we actually want to capture is that dsa_slave_manage_host_flood(),
which responds to changes in the IFF_PROMISC flag on a user port, does
actually notify the driver via a call to port_set_host_flood() for that
user port. Through this method, the driver is responsible for turning
flooding towards the CPU port(s) on or off, from the user port given as
argument. If CPU flood control does not depend on user port, then you'll
have to keep CPU flooding enabled as long as any user port wants it.

> TEST: lan2: Unicast IPv4 to unknown MAC address, allmulti       [FAIL]
>         reception succeeded, but should have failed

So I won't repeat why reception of unnecessary packets succeeds, because
the reason is always the same - the driver doesn't dynamically adapt to
all the indications DSA is giving it.

Here, the test captures the fact that IFF_ALLMULTI should enable host
flooding just for unknown multicast, while IFF_PROMISC should allow both
unicast and multicast. The packet was unicast, so IFF_ALLMULTI shouldn't
allow it.

> TEST: lan2: Multicast IPv4 to joined group                      [ OK ]

Here, I used a trivial program I found online to emit a IP_ADD_MEMBERSHIP
setsockopt, to trigger the kernel code that calls dev_mc_add() on the
net device. It seems to not be possible by design to join an IP
multicast group using a dedicated command in a similar way to how you'd
add an FDB entry on a port; instead the kernel joins the multicast group
for as long as the user application persists, and leaves the group afterwards.

As you can probably guess, dev_mc_add() calls made by modules outside
DSA are translated by dsa_slave_set_rx_mode() into a port_mdb_add() on
the CPU port, with DSA_DB_PORT.

If the gswip driver doesn't implement port_mdb_add() but rather treats
multicast as broadcast (by sending it to the CPU), naturally this test
"passes" in the sense that it thinks the driver reacted properly to what
was asked.

> TEST: lan2: Multicast IPv4 to unknown group                     [FAIL]
>         reception succeeded, but should have failed

This tests packet delivery to a multicast address for which dev_mc_add()
wasn't called.

> TEST: lan2: Multicast IPv4 to unknown group, promisc            [ OK ]
> TEST: lan2: Multicast IPv4 to unknown group, allmulti           [ OK ]
> TEST: lan2: Multicast IPv6 to joined group                      [ OK ]

No driver change needed between IPv4 and IPv6 multicast, the addresses
offloaded via switchdev are the L2 translations anyway (01:00:5e:xx:xx:xx
for IPv4 and 33:33:xx:xx:xx:xx for IPv6), so multiple IP addresses will
map to the same MAC address. Side note, the reason why I had to fork
mtools was to add support for IPV6_ADD_MEMBERSHIP, otherwise the
principles are more or less the same.

> TEST: lan2: Multicast IPv6 to unknown group                     [FAIL]
>         reception succeeded, but should have failed
> TEST: lan2: Multicast IPv6 to unknown group, promisc            [ OK ]
> TEST: lan2: Multicast IPv6 to unknown group, allmulti           [ OK ]
> TEST: br0: Unicast IPv4 to primary MAC address                  [ OK ]

Here is where things get interesting. I'm going to take a pause and
explain that the bridge related selftests fail in the same way for me
too, and that the fixes should go to the bridge driver rather than to
DSA.

The problem is that DSA treats IFF_PROMISC as "enable host flooding for
this port", and the bridge puts its ports in IFF_PROMISC mode by reflex.
So in theory, the patches here don't really capture anything relevant as
long as that keeps being the case. Regardless of whether an address is
present in the hardware FDB or not, the IFF_PROMISC triggered by the
bridge keeps host flooding enabled, which makes us receive more than we
need.

I've published a patch set here explaining how I think this would need
to be solved.
https://patchwork.kernel.org/project/netdevbpf/cover/20220408200337.718067-1-vladimir.oltean@nxp.com/
I'd probably do things a bit differently when revisiting.

> TEST: br0: Unicast IPv4 to macvlan MAC address                  [ OK ]

There is even more bridge driver rework to do here. What this test
captures is a macvlan interface on top of a bridge on top of a DSA user
port. Earlier I said that macvlan uses dev_uc_add() towards its lower
interface - the bridge - to ensure its RX filter doesn't drop required
packets. But if the net device doesn't support IFF_UNICAST_FLT (and the
bridge doesn't), dev_uc_add() falls back to putting the interface in
promiscuous mode. And the bridge has this crazy interpretation of what
promiscuous mode means, which is hard to reason about. So eventually,
even though the higher and upper layers of the bridge do what's expected,
we still end up with DSA in promiscuous mode, and hence receiving
packets irrespective of MAC DA. This is why this test really passes
(the promiscuity masks the presence of the macvlan address in the FDB),
and will continue to mask it for as long as the bridge doesn't implement
IFF_UNICAST_FLT.

> TEST: br0: Unicast IPv4 to unknown MAC address                  [FAIL]
>         reception succeeded, but should have failed
> TEST: br0: Unicast IPv4 to unknown MAC address, promisc         [ OK ]
> TEST: br0: Unicast IPv4 to unknown MAC address, allmulti        [FAIL]
>         reception succeeded, but should have failed
> TEST: br0: Multicast IPv4 to joined group                       [ OK ]

Just one more small comment to make.
The addresses in the "br0" tests are also notified through port_mdb_add(),
but they use DSA_DB_BRIDGE since they come to DSA via switchdev rather
than via dev_mc_add() - they came _to_the_bridge_ via dev_mc_add().
DSA drivers are expect to offload these multicast entries to a different
database than the port_mdb_add() I've described above. This is a
database that is active only while $swp1 is part of a bridge, while
DSA_DB_PORT is active only while $swp1 is standalone.

> TEST: br0: Multicast IPv4 to unknown group                      [FAIL]
>         reception succeeded, but should have failed
> TEST: br0: Multicast IPv4 to unknown group, promisc             [ OK ]
> TEST: br0: Multicast IPv4 to unknown group, allmulti            [ OK ]
> TEST: br0: Multicast IPv6 to joined group                       [ OK ]
> TEST: br0: Multicast IPv6 to unknown group                      [FAIL]
>         reception succeeded, but should have failed
> TEST: br0: Multicast IPv6 to unknown group, promisc             [ OK ]
> TEST: br0: Multicast IPv6 to unknown group, allmulti            [ OK ]
> 
> 
> Thank you!
> Best regards,
> Martin
> 
> [0] https://lore.kernel.org/netdev/20220706210651.ozvjcwwp2hquzmhn@skbuf/
> [1] https://lore.kernel.org/netdev/20220702185652.dpzrxuitacqp6m3t@skbuf/