netdev - Re: [RFC PATCH v2 net-next 06/17] net: dsa: add addresses obtained from RX filtering to host addresses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210226132834.dx25pq4x757acujk@skbuf>
Date:   Fri, 26 Feb 2021 15:28:34 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Tobias Waldekranz <tobias@...dekranz.com>
Cc:     netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Ido Schimmel <idosch@...sch.org>,
        DENG Qingfang <dqfext@...il.com>,
        George McCollister <george.mccollister@...il.com>,
        Vlad Yasevich <vyasevich@...il.com>,
        Roopa Prabhu <roopa@...dia.com>,
        Nikolay Aleksandrov <nikolay@...dia.com>
Subject: Re: [RFC PATCH v2 net-next 06/17] net: dsa: add addresses obtained
 from RX filtering to host addresses

On Fri, Feb 26, 2021 at 11:59:36AM +0100, Tobias Waldekranz wrote:
> On Wed, Feb 24, 2021 at 13:43, Vladimir Oltean <olteanv@...il.com> wrote:
> > From: Vladimir Oltean <vladimir.oltean@....com>
> >
> > In case we have ptp4l running on a bridged DSA switch interface, the PTP
> > traffic is classified as link-local (in the default profile, the MAC
> > addresses are 01:1b:19:00:00:00 and 01:80:c2:00:00:0e), which means it
> > isn't the responsibility of the bridge to make sure it gets trapped to
> > the CPU.
> >
> > The solution is to implement the standard callbacks for dev_uc_add and
> > dev_mc_add, and behave just like any other network interface: ensure
> > that the user space program can see those packets.
>
> So presumably the application would use PACKET_ADD_MEMBERSHIP to set
> this up?
>
> This is a really elegant way of solving this problem I think!

Yes, using the unmodified *_ADD_MEMBERSHIP UAPI was the intention.
If that is not possible, the whole idea kinda loses its appeal and we'd
be better off starting from scratch and figuring out how we'd prefer for
user space to request (exclusive) address membership.

> One problem I see is that this will not result in packets getting
> trapped to the CPU, rather they will simply be forwarded.  I.e. with
> this patch applied, once ptp4l adds the groups it is interested in, my
> HW FDB will look like this:
>
> ADDR                VID  DST   TYPE
> 01:1b:19:00:00:00     0  cpu0  static
> 01:80:c2:00:00:0e     0  cpu0  static
>
> But this will not allow these groups to ingress on (STP) blocked
> ports. AFAIK, PTP (certainly LLDP which also uses the latter group)
> should be able to do that.
>
> For mv88e6xxx (but I think this applies to most switches), there are
> roughly three ways a given multicast group can reach the CPU:
>
> 1. Trap: Packet is unconditionally redirected to the CPU, independent
>    of things like 802.1X or STP state on the ingressing port.
> 2. Mirror: Send a copy of packets that pass all other ingress policy to
>    the CPU.
> 3. Forward: Forward packets that pass all other ingress policy to the
>    CPU.
>
> Entries are now added as "Forward", which means that the group will no
> longer reach the other local ports. But the command from the application
> is "I want to see these packets", it says nothing about preventing the
> group from being forwarded. So I think the default ought to be
> "Mirror". Additionally, we probably need some way of specifying "Trap"
> to those applications that need it. E.g. ptp4l could specify
> PACKET_MR_MULTICAST_TRAP in mr_action or something if it does not want
> the bridge (or the switch) to forward it.
>
> If "Forward" is desired, the existing "bridge mdb" interface seems like
> the proper one, since it also affects other ports.

I'm not sure I understand your exact requirement.

Let me try to quote from IEEE 802.1Q-2018 clause 8.13.9 "Points of
attachment and connectivity for Higher Layer Entities". For context,
this talks about about Higher Layer Entities, aka applications such as
STP, MRP, ISIS-SPB, whose world view is as though they are connected
directly to the LAN that the bridge port is connected to, i.e. they
bypass the MAC relay (forwarding) function.

The spec says:

  Controls placed in the forwarding path have no effect on the ability
  of a Higher Layer Entity to transmit and receive frames to or from a
  given LAN using a direct attachment to that LAN (e.g., from entity A
  to LAN A); they only affect the path taken by any indirect
  transmission or reception (e.g., from entity A to or from LAN B).

Then there's this drawing:

 +-----------------------+                         +-----------------------+
 | Higher Layer Entity A |                         | Higher Layer Entity B |
 +-----------------------+                         +-----------------------+
            |                                                   |
            |     +---------------------------------------+     |
            |     |             MAC Relay Entity          |     |
            |     |                                       |     |
            |     |   Port   Filtering Database   Port    |     |
            |     |  State       Information     State    |     |
            |     |                                       |     |
            |     |     /             /             /     |     |
            +-----|---x/  x---------x/  x---------x/  x---+-----+
            |     |                                       |     |
            |     +---------------------------------------+     |
            |                                                   |
 +----------------------------+                      +----------------------------+
 |          |                 |                      |          |                 |
 |          x                 |                      |          x                 |
 |    MAC    /                |                      |    MAC    /                |
 |  Entity  / MAC_Operational |                      |  Entity  / MAC_Operational |
 |          x                 |                      |          x                 |
 |          |                 |                      |          |                 |
 +----------------------------+                      +----------------------------+
            |                                                   |
            |                                                   |
            |                                                   |
   ----------------------------                        ----------------------------
  /       LAN A               /                       /       LAN B               /
 /                           /                       /                           /
/---------------------------/                       /---------------------------/

Figure 8-20: Effect of control information on the forwarding path

A one phrase conclusion from the above is that a Higher Layer Entity
should be able to accept packets from the bridge port regardless of its
STP state, as long as the MAC is operational.

In my world view, anything sent and received through the swpN DSA
interfaces, and therefore by injection/extraction through the CPU port,
represents a Higher Layer Entity.

So the fact that your Marvell switch treats the CPU port as just another
destination for forwarding should be hidden away from the externally
observable behavior. DSA does not really support the switched endpoint
use case, we should deny attaching IP interfaces to it.

But to your point: we are currently using the .port_fdb_add and
.port_mdb_add API in DSA to install the host-filtered addresses. This is
true, and potentially incorrect, since indeed it assumes that the
underlying mechanism to trap these addresses to the CPU is through the
FDB/MDB, which will violate the expectation of Higher Layer Entities
since it goes through the forwarding process. I think we could add a new
set of APIs in DSA, something like .host_uc_add and .host_mc_add. Then,
drivers that can't do any better can go ahead and internally call their
.port_fdb_add and .port_mdb_add. Question: If we add .host_uc_add and
.host_mc_add, should these APIs be per front port? And if they should,
should we leave the switch driver the task of reference counting them?


As to why does IEEE 1588 use 01-80-C2-00-00-0E for peer delay in its
default profile for the L2 transport, I could only find this (quoting
clause "F.3 Multicast MAC Addresses"):

  To ensure peer delay measurements on ports blocked by (Rapid/Multiple)
  Spanning Tree Protocols, a reserved address, 01-80-C2-00-00-0E, shall be
  used as a Destination MAC Address for PTP peer delay mechanism messages.

Basically, unlike end-to-end delay measurements, peer delay is by
definition point-to-point, which in an L2 network mean link-local.
So if your LAN uses the peer delay measurement protocol, then all your
switches must have some sort of PTP awareness. There are 2 cases really:
- You are a Peer-to-peer Transparent Clock, so you must speak the Peer
  Delay protocol yourself. Therefore you must have a Higher Layer Entity
  that consumes the PTP PDUs. So for this case, it doesn't really make
  any difference that the reserved bridge MAC address range is used.
- You are an End-to-end Transparent Clock. These are an oddity of the
  1588 standard which do not speak the Peer Delay protocol, but are
  allowed to forward Peer Delay messages, and update the correctionField
  of those Peer Delay messages that are Events (contain timestamps).
  I would think that this is where having a reserved address for the
  Peer Delay messages would make a difference.


Then, there seems to be the second part of your request, that of address
exclusivity. I think that for addresses that should explicitly be
removed from the hardware data path, the tc-trap action was created for
that exact purpose. However, the question really becomes: does PTP care
enough to know it's running on top of a switchdev interface, so that it
should claim exclusive ownership of that multicast group, or should it
just care enough to say "hey, I'm here, I'm interested in it too"?

For one thing, the 01-80-c2-00-00-0e multicast group is shared, even
IEEE 1588 says so:

  NOTE 2: At its July 17−20, 2006 meeting the IEEE 802.1 Working Group
  approved a motion that included the following text: "re-designate the
  reserved address currently identified for use by 802.1AB as an address
  that can be used by protocols that require the scope of the address to
  be limited to an individual LAN" and "The reserved multicast address
  that IEEE 1588 should use is 01-80-C2-00-00-0E." This address is not
  reserved exclusively for PTP, but rather it is a shared address.

and the bridge driver also supports this odd attribute:

  group_fwd_mask MASK - set the group forward mask. This is the bitmask
  that is applied to decide whether to forward incoming frames destined
  to link-local addresses, ie addresses of the form 01:80:C2:00:00:0X
  (defaults to 0, ie the bridge does not forward any link-local frames).

which by the way seems to be in blatant contradiction of clause
8.13.4 Reserved MAC addresses: "Any frame with a destination address
that is a reserved MAC address shall not be forwarded by a Bridge".

So if ptp4l was to install a trap action by itself, it would potentially
interact in unexpected ways with other uses of that address on the same
system.

Thoughts?