[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211231002823.de3ugpurq3fv343b@skbuf>
Date: Fri, 31 Dec 2021 02:28:23 +0200
From: Vladimir Oltean <olteanv@...il.com>
To: Colin Foster <colin.foster@...advantage.com>
Cc: netdev@...r.kernel.org,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
Horatiu Vultur <horatiu.vultur@...rochip.com>
Subject: Re: packets trickling out of STP-blocked ports
Hi Colin,
On Thu, Dec 30, 2021 at 03:07:40PM -0800, Colin Foster wrote:
> After running this, the STP blocks swp3, and swp1/2 are forwarding.
>
> Periodically I see messages saying that swp2 is receiving packets with
> own address as source address.
>
> I can confirm that via ethtool that TX packets are increasing on swp3. I
> believe I captured the event via tshark. A 4 minute capture showed three
> non-STP packets on swp2. All three of these packets are ICMPv6 Router
> Solicitation packets.
>
> I would expect no packets at all to egress swp3. Is this an issue that
> is unique to me and my in-development configuration? Or is this an issue
> with all Ocelot / Felix devices?
I don't remember noticing these (or maybe I did and I forgot), but
reasoning about it, it's a pretty logical consequence of some of the
design decisions that were made.
One would think that when a network interface is under a bridge, it is
unavailable for direct IP termination by itself - you do the IP
termination through the br0 interface. But that isn't really enforced
anywhere - it's just that the bridge breaks IP termination by default on
its individual member ports by stealing all their traffic with its RX handler.
That RX handler can be taught what to steal and what not to steal using
netfilter ebtables rules. With some carefully designed rules, you could
still have some IP termination through the individual bridge ports.
Hardware isn't carved out according to your expectation that no packets
should egress a blocked port, either. Switches in general, and Ocelot in
particular, have a way to send "control" packets that bypass the
analyzer block and STP state (the bridging service, basically) and are
sent towards a precise set of destination ports. This is done by setting
the BYPASS bit from the injection frame header. Currently, Linux sends
"control" packets to the switch all the time, and that is fine, because
although those packets have the ability to go where they don't belong,
the OS (the bridge driver) is supposed to know that, and just not send
packets there. As a side note, there was some work to allow switch
drivers to send "data" packets to the switch, and these correspond to
traffic that originates from a bridge device, but I am just mentioning
this to clarify that it is irrelevant for the purpose of the discussion here.
Even considering an Intel card with no bridging offload at all, if you
put it in the same situation (eth0 under br0, and eth0 is blocked), you
can still put an IP address on eth0 and ping away just fine (you won't
get back the reply as mentioned above, but that's separate really).
Nobody will prevent packets from eth0 from being sent, since the bridge
driver code path isn't invoked on TX unless the socket is bound to br0.
The key point is that the direct xmit data path through swp3, as well as
the data path br0 -> swp3, both exist, in hardware and in software. And
while in hardware they're a bit more clearly separated (in IEEE 802.1Q
there's even a block diagram to clarify that both exist), in software
they're entangled in a bit of a mess, and there are parts of the network
stack and of user space that aren't aware that swp3 is under a bridge,
so IPv6 Router Solicitation messages being sent through swp3 shouldn't
be much of a surprise.
With that out of the way.
Traditionally, DSA has made a design decision that all switch ports
inherit the single MAC address of the DSA master. IOW, if you have 1 DSA
master and 4 switch ports, you have 5 interfaces in the system with the
same MAC address. It was like this for a long time, and relatively
recently, Xiaofei Shen added the ability for individual DSA interfaces
to have their own MAC address stored in the device tree.
As an argument in favor of the status quo, Florian explained that:
| By default, DSA switch need to come up in a configuration where all
| ports (except CPU/management) must be strictly separate from every other
| port such that we can achieve what a standalone Ethernet NIC would do.
| This works because all ports are isolated from one another, so there is
| no cross talk and so having the same MAC address (the one from the CPU)
| on the DSA slave network devices just works, each port is a separate
| broadcast domain.
|
| Once you start bridging one or ore ports, the bridge root port will have
| a MAC address, most likely the one the CPU/management Ethernet MAC, but
| similarly, this is not an issue and that's exactly how a software bridge
| would work as well.
https://patchwork.kernel.org/project/linux-arm-msm/patch/20190222125815.12866-1-vkoul@kernel.org/
Although yes, that does make some level of sense, it kind of omits the
fact that two DSA ports can be used for communication in loopback too
(either through a direct cable, or through an externally switched network).
In that case, having a MAC SA != MAC DA in the Ethernet packets is kind
of important (I found that out while trying to compose some selftests
for DSA).
If my intuition is correct, you are using the default configuration
where all DSA interfaces have the MAC address inherited from the DSA
master. Corrolary, swp2 and swp3 have the same MAC address.
swp3 is a bridged port, and a blocked port at that, but not all parts of
the network stack know that. So from time to time, you get these IPv6
Router Solicitation messages. They could be anything else, in fact.
swp2 is a bridged port, and in the forwarding state. So packets it
receives are eligible for learning.
When br0 receives a packet via swp2 that originated from swp3, it just
complains: "hey, learning the route for this packet's MAC SA to go
towards swp2 would mean that I would no longer terminate packets with
this MAC DA locally, which is kinda weird, since that MAC address is
also marked as non-forwarded." Which is fair.
So IMHO, this behavior is neither good nor bad, it is just the way it is,
nothing to worry about if that's what concerns you. To prove or disprove
what I said you could try to configure individual MAC addresses and see
whether that fixes the problem.
> (side note - if there's a place where a parser for Ocelot NPI traffic is
> hidden, that might eventually save me a lot of debugging in Lua)
Nope, there isn't, although it would certainly be great if you could
teach tcpdump about it, similar to what Vivien has done for Marvell:
https://github.com/the-tcpdump-group/tcpdump/blob/master/print-dsa.c
I've wanted to do that for a long time, but I've had lots of other
priorities, and it's tricky for various reasons (there isn't exactly a
single on-the-wire format, but it depends on whether you configure the
NPI port to have no prefix, a short prefix or a long prefix; this
configuration is independent for the RX and TX directions; currently we
use short prefix on RX and TX, but in older kernels we used to use no
prefix on TX, and long prefix on RX on some older kernels, all while the
tagging protocol was still "ocelot"; I'm not sure whether the presence
or absence of a prefix, and what kind, can be deduced by looking at the
packet alone).
Powered by blists - more mailing lists