netdev - [RFC PATCH net-next 05/13] net: dsa: Optional VLAN-based port separation for switches without tagging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190324032346.32394-6-olteanv@gmail.com>
Date:   Sun, 24 Mar 2019 05:23:38 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     davem@...emloft.net, netdev@...r.kernel.org
Cc:     f.fainelli@...il.com, andrew@...n.ch, vivien.didelot@...il.com,
        linus.walleij@...aro.org, Vladimir Oltean <olteanv@...il.com>
Subject: [RFC PATCH net-next 05/13] net: dsa: Optional VLAN-based port separation for switches without tagging

This patch provides generic DSA code for using VLAN (802.1Q) tags for
the same purpose as a dedicated switch tag for injection/extraction.
It is based on the discussions and interest that has been so far
expressed in https://www.spinics.net/lists/netdev/msg556125.html.

Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q
does not offer a complete solution for drivers (nor can it). Instead, it
provides generic code that driver can opt into calling:
- dsa_8021q_xmit: Inserts a VLAN header with the specified contents.
  Currently a few driver are inserting headers that are simply 802.1Q
  with custom fields. Can be called from another tagging protocol's xmit
  function.
- dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb.
  Removing the VLAN header is left as a decision for the caller to make.
- dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID
  and a Tx VID, for proper untagged traffic identification on ingress
  and steering on egress. Also sets up the VLAN trunk on the upstream
  (CPU or DSA) port. Drivers are intentionally left to call this
  function explicitly, depending on the context and hardware support.
  The expected switch behavior and VLAN semantics should not be violated
  under any conditions. That is, after calling
  dsa_port_setup_8021q_tagging, the hardware should still pass all
  ingress traffic, be it tagged or untagged.

This only works when switch ports are standalone, or when they are added
to a VLAN-unaware bridge. It will probably remain this way for the
reasons below.

When added to a bridge that has vlan_filtering 1, the bridge core will
install its own VLANs and reset the pvids through switchdev. For the
bridge core, switchdev is a write-only pipe. All VLAN-related state is
kept in the bridge core and nothing is read from DSA/switchdev or from
the driver. So the bridge core will break this port separation because
it will install the vlan_default_pvid into all switchdev ports.

Even if we could teach the bridge driver about switchdev preference of a
certain vlan_default_pvid, there would still exist many other challenges.

Firstly, in the DSA rcv callback, a driver would have to perform an
iterative reverse lookup to find the correct switch port. That is
because the port is a bridge slave, so its Rx VID (port PVID) is subject
to user configuration. How would we ensure that the user doesn't reset
the pvid to a different value, or to a non-unique value within this DSA
switch tree?

Finally, not all switch ports are equal in DSA, and that makes it
difficult for the bridge to be completely aware of this anyway.
The CPU port needs to transmit tagged packets (VLAN trunk) in order for
the DSA rcv code to be able to decode source information.
But the bridge code has absolutely no idea which switch port is the CPU
port, if nothing else then just because there is no netdevice registered
by DSA for the CPU port.
Also DSA does not currently allow the user to specify that they want the
CPU port to do VLAN trunking anyway. VLANs are added to the CPU port
using the same flags as they were added on the user port.

So the VLANs installed by dsa_port_setup_8021q_tagging per driver
request should remain private from the bridge's and user's perspective,
and should not alter the hardware's behavior with VLAN-tagged traffic.
If the hardware cannot handle VLAN tag stacking, it should also disable
this port separation when added as slave to a vlan_filtering bridge.
If the hardware does support VLAN tag stacking, it should somehow back
up its private VLAN settings when the bridge tries to override them.

Signed-off-by: Vladimir Oltean <olteanv@...il.com>
---
 include/net/dsa.h   |   4 +
 net/dsa/Kconfig     |   9 +++
 net/dsa/Makefile    |   1 +
 net/dsa/dsa_priv.h  |  10 +++
 net/dsa/tag_8021q.c | 185 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 209 insertions(+)
 create mode 100644 net/dsa/tag_8021q.c

diff --git a/include/net/dsa.h b/include/net/dsa.h
index a16fd577349b..b22c350c40f0 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -574,5 +574,9 @@ int dsa_port_get_phy_strings(struct dsa_port *dp, uint8_t *data);
 int dsa_port_get_ethtool_phy_stats(struct dsa_port *dp, uint64_t *data);
 int dsa_port_get_phy_sset_count(struct dsa_port *dp);
 void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up);
+#ifdef CONFIG_NET_DSA_TAG_8021Q
+int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int index,
+				 bool enabled);
+#endif
 
 #endif
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index fab49132345f..2f3a103d7d1a 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -26,6 +26,15 @@ config NET_DSA_LEGACY
 	  This feature is scheduled for removal in 4.17.
 
 # tagging formats
+config NET_DSA_TAG_8021Q
+	bool
+	help
+	  Unlike the other tagging protocols, the 802.1Q config option simply
+	  provides helpers for other tagging implementations that might rely on
+	  VLAN in one way or another. It is not a complete solution.
+
+	  Drivers which use these helpers should select this as dependency.
+
 config NET_DSA_TAG_BRCM
 	bool
 
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 6e721f7a2947..d7fc3253d497 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -5,6 +5,7 @@ dsa_core-y += dsa.o dsa2.o master.o port.o slave.o switch.o
 dsa_core-$(CONFIG_NET_DSA_LEGACY) += legacy.o
 
 # tagging formats
+dsa_core-$(CONFIG_NET_DSA_TAG_8021Q) += tag_8021q.o
 dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
 dsa_core-$(CONFIG_NET_DSA_TAG_BRCM_PREPEND) += tag_brcm.o
 dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 8048ced3708f..105058450621 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -203,6 +203,16 @@ dsa_slave_to_master(const struct net_device *dev)
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
 
+/* tag_8021q.c */
+struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev,
+			       u16 tpid, u16 tci);
+struct sk_buff *dsa_8021q_rcv(struct sk_buff *skb, struct net_device *netdev,
+			      struct packet_type *pt, u16 *tpid, u16 *tci);
+u16 dsa_tagging_tx_vid(struct dsa_switch *ds, int port);
+u16 dsa_tagging_rx_vid(struct dsa_switch *ds, int port);
+int dsa_tagging_rx_switch_id(u16 vid);
+int dsa_tagging_rx_source_port(u16 vid);
+
 /* tag_brcm.c */
 extern const struct dsa_device_ops brcm_netdev_ops;
 extern const struct dsa_device_ops brcm_prepend_netdev_ops;
diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c
new file mode 100644
index 000000000000..221299b264f5
--- /dev/null
+++ b/net/dsa/tag_8021q.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Vladimir Oltean <olteanv@...il.com>
+ */
+#include <linux/if_bridge.h>
+#include <linux/if_vlan.h>
+
+#include "dsa_priv.h"
+
+#define DSA_TAGGING_VID_RANGE    (DSA_MAX_SWITCHES * DSA_MAX_PORTS)
+#define DSA_TAGGING_VID_BASE     (VLAN_N_VID - 2 * DSA_TAGGING_VID_RANGE - 1)
+#define DSA_TAGGING_RX_VID_BASE  (DSA_TAGGING_VID_BASE)
+#define DSA_TAGGING_TX_VID_BASE  (DSA_TAGGING_VID_BASE + DSA_TAGGING_VID_RANGE)
+
+u16 dsa_tagging_tx_vid(struct dsa_switch *ds, int port)
+{
+	return DSA_TAGGING_TX_VID_BASE + (DSA_MAX_PORTS * ds->index) + port;
+}
+
+u16 dsa_tagging_rx_vid(struct dsa_switch *ds, int port)
+{
+	return DSA_TAGGING_RX_VID_BASE + (DSA_MAX_PORTS * ds->index) + port;
+}
+
+int dsa_tagging_rx_switch_id(u16 vid)
+{
+	return ((vid - DSA_TAGGING_RX_VID_BASE) / DSA_MAX_PORTS);
+}
+
+int dsa_tagging_rx_source_port(u16 vid)
+{
+	return ((vid - DSA_TAGGING_RX_VID_BASE) % DSA_MAX_PORTS);
+}
+
+/* Rx VLAN tagging (left) and Tx VLAN tagging (right) setup shown for a single
+ * front-panel switch port (here swp0).
+ *
+ * Port identification through VLAN (802.1Q) tags has different requirements
+ * for it to work effectively:
+ *  - On Rx (ingress from network): each front-panel port must have a pvid
+ *    that uniquely identifies it, and the egress of this pvid must be tagged
+ *    towards the CPU port, so that software can recover the source port based
+ *    on the VID in the frame. But this would only work for standalone ports;
+ *    if bridged, this VLAN setup would break autonomous forwarding and would
+ *    force all switched traffic to pass through the CPU. So we must also make
+ *    the other front-panel ports members of this VID we're adding, albeit
+ *    we're not making it their PVID (they'll still have their own).
+ *    By the way - just because we're installing the same VID in multiple
+ *    switch ports doesn't mean that they'll start to talk to one another, even
+ *    while not bridged: the final forwarding decision is still an AND between
+ *    the L2 forwarding information (which is limiting forwarding in this case)
+ *    and the VLAN-based restrictions (of which there are none in this case,
+ *    since all ports are members).
+ *  - On Tx (ingress from CPU and towards network) we are faced with a problem.
+ *    If we were to tag traffic (from within DSA) with the port's pvid, all
+ *    would be well, assuming the switch ports were standalone. Frames would
+ *    have no choice but to be directed towards the correct front-panel port.
+ *    But because we also want the Rx VLAN to not break bridging, then
+ *    inevitably that means that we have to give them a choice (of what
+ *    front-panel port to go out on), and therefore we cannot steer traffic
+ *    based on the Rx VID. So what we do is simply install one more VID on the
+ *    front-panel and CPU ports, and profit off of the fact that steering will
+ *    work just by virtue of the fact that there is only one other port that's
+ *    a member of the VID we're tagging the traffic with - the desired one.
+ *
+ * So at the end, each front-panel port will have one Rx VID (also the PVID),
+ * the Rx VID of all other front-panel ports, and one Tx VID. Whereas the CPU
+ * port will have the Rx and Tx VIDs of all front-panel ports, and on top of
+ * that, is also tagged-input and tagged-output (VLAN trunk).
+ *
+ *               CPU port                               CPU port
+ * +-------------+-----+-------------+    +-------------+-----+-------------+
+ * |  Rx VID     |     |             |    |  Tx VID     |     |             |
+ * |  of swp0    |     |             |    |  of swp0    |     |             |
+ * |             +-----+             |    |             +-----+             |
+ * |                ^ T              |    |                | Tagged         |
+ * |                |                |    |                | ingress        |
+ * |    +-------+---+---+-------+    |    |    +-----------+                |
+ * |    |       |       |       |    |    |    | Untagged                   |
+ * |    |     U v     U v     U v    |    |    v egress                     |
+ * | +-----+ +-----+ +-----+ +-----+ |    | +-----+ +-----+ +-----+ +-----+ |
+ * | |     | |     | |     | |     | |    | |     | |     | |     | |     | |
+ * | |PVID | |     | |     | |     | |    | |     | |     | |     | |     | |
+ * +-+-----+-+-----+-+-----+-+-----+-+    +-+-----+-+-----+-+-----+-+-----+-+
+ *   swp0    swp1    swp2    swp3           swp0    swp1    swp2    swp3
+ */
+int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
+{
+	int upstream = dsa_upstream_port(ds, port);
+	struct dsa_port *dp = &ds->ports[port];
+	struct dsa_port *upstream_dp = &ds->ports[upstream];
+	u16 rx_vid = dsa_tagging_rx_vid(ds, port);
+	u16 tx_vid = dsa_tagging_tx_vid(ds, port);
+	int i, err;
+
+	/* The CPU port is implicitly configured by
+	 * configuring the front-panel ports
+	 */
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
+	/* Add this user port's Rx VID to the membership list of all others
+	 * (including itself). This is so that bridging will not be hindered.
+	 * L2 forwarding rules still take precedence when there are no VLAN
+	 * restrictions, so there are no concerns about leaking traffic.
+	 */
+	for (i = 0; i < ds->num_ports; i++) {
+		struct dsa_port *other_dp = &ds->ports[i];
+		u16 flags;
+
+		if (i == upstream)
+			/* CPU port needs to see this port's Rx VID
+			 * as tagged egress.
+			 */
+			flags = 0;
+		else if (i == port)
+			/* The Rx VID is pvid on this port */
+			flags = BRIDGE_VLAN_INFO_UNTAGGED |
+				BRIDGE_VLAN_INFO_PVID;
+		else
+			/* The Rx VID is a regular VLAN on all others */
+			flags = BRIDGE_VLAN_INFO_UNTAGGED;
+
+		err = dsa_port_trans_vlan_apply(other_dp, rx_vid, flags,
+						enabled);
+		if (err) {
+			dev_err(ds->dev, "Failed to apply Rx VID %d to port %d: %d\n",
+				rx_vid, port, err);
+			return err;
+		}
+	}
+	/* Finally apply the Tx VID on this port and on the CPU port */
+	err = dsa_port_trans_vlan_apply(dp, tx_vid, BRIDGE_VLAN_INFO_UNTAGGED,
+					enabled);
+	if (err) {
+		dev_err(ds->dev, "Failed to apply Tx VID %d on port %d: %d\n",
+			tx_vid, port, err);
+		return err;
+	}
+	err = dsa_port_trans_vlan_apply(upstream_dp, tx_vid, 0, enabled);
+	if (err) {
+		dev_err(ds->dev, "Failed to apply Tx VID %d on port %d: %d\n",
+			tx_vid, upstream, err);
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dsa_port_setup_8021q_tagging);
+
+struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev,
+			       u16 tpid, u16 tci)
+{
+	/* skb->data points at skb_mac_header, which
+	 * is fine for vlan_insert_tag.
+	 */
+	return vlan_insert_tag(skb, tpid, tci);
+}
+EXPORT_SYMBOL_GPL(dsa_8021q_xmit);
+
+struct sk_buff *dsa_8021q_rcv(struct sk_buff *skb, struct net_device *netdev,
+			      struct packet_type *pt, u16 *tpid, u16 *tci)
+{
+	struct vlan_ethhdr *tag;
+
+	if (unlikely(!pskb_may_pull(skb, VLAN_HLEN)))
+		return NULL;
+
+	tag = vlan_eth_hdr(skb);
+	*tpid = ntohs(tag->h_vlan_proto);
+	*tci = ntohs(tag->h_vlan_TCI);
+
+	/* skb->data points in the middle of the VLAN tag,
+	 * after tpid and before tci. This is because so far,
+	 * ETH_HLEN (DMAC, SMAC, EtherType) bytes were pulled.
+	 * There are 2 bytes of VLAN tag left in skb->data, and upper
+	 * layers expect the 'real' EtherType to be consumed as well.
+	 * Coincidentally, a VLAN header is also of the same size as
+	 * the number of bytes that need to be pulled.
+	 */
+	skb_pull_rcsum(skb, VLAN_HLEN);
+
+	return skb;
+}
+EXPORT_SYMBOL_GPL(dsa_8021q_rcv);
+
-- 
2.17.1