[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210628220011.1910096-7-olteanv@gmail.com>
Date: Tue, 29 Jun 2021 01:00:03 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: netdev@...r.kernel.org
Cc: Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Vivien Didelot <vivien.didelot@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Ido Schimmel <idosch@...sch.org>,
Tobias Waldekranz <tobias@...dekranz.com>,
Roopa Prabhu <roopa@...dia.com>,
Nikolay Aleksandrov <nikolay@...dia.com>,
Vladimir Oltean <vladimir.oltean@....com>
Subject: [PATCH v4 net-next 06/14] net: dsa: reference count the MDB entries at the cross-chip notifier level
From: Vladimir Oltean <vladimir.oltean@....com>
Ever since the cross-chip notifiers were introduced, the design was
meant to be simplistic and just get the job done without worrying too
much about dangling resources left behind.
For example, somebody installs an MDB entry on sw0p0 in this daisy chain
topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
sw1p4 and sw2p4.
|
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4
[ user ] [ user ] [ user ] [ dsa ] [ cpu ]
[ x ] [ ] [ ] [ ] [ ]
|
+---------+
|
sw1p0 sw1p1 sw1p2 sw1p3 sw1p4
[ user ] [ user ] [ user ] [ dsa ] [ dsa ]
[ ] [ ] [ ] [ ] [ x ]
|
+---------+
|
sw2p0 sw2p1 sw2p2 sw2p3 sw2p4
[ user ] [ user ] [ user ] [ user ] [ dsa ]
[ ] [ ] [ ] [ ] [ x ]
Then the same person deletes that MDB entry. The cross-chip notifier for
deletion only matches sw0p0:
|
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4
[ user ] [ user ] [ user ] [ dsa ] [ cpu ]
[ x ] [ ] [ ] [ ] [ ]
|
+---------+
|
sw1p0 sw1p1 sw1p2 sw1p3 sw1p4
[ user ] [ user ] [ user ] [ dsa ] [ dsa ]
[ ] [ ] [ ] [ ] [ ]
|
+---------+
|
sw2p0 sw2p1 sw2p2 sw2p3 sw2p4
[ user ] [ user ] [ user ] [ user ] [ dsa ]
[ ] [ ] [ ] [ ] [ ]
Why?
Because the DSA links are 'trunk' ports, if we just go ahead and delete
the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
entries when they are still needed. Just consider the fact that somebody
does:
- add a multicast MAC address towards sw0p0 [ via the cross-chip
notifiers it gets installed on the DSA links too ]
- add the same multicast MAC address towards sw0p1 (another port of that
same switch)
- delete the same multicast MAC address from sw0p0.
At this point, if we deleted the MAC address from the DSA links, it
would be flooded, even though there is still an entry on switch 0 which
needs it not to.
So that is why deletions only match the targeted source port and nothing
on DSA links. Of course, dangling resources means that the hardware
tables will eventually run out given enough additions/removals, but hey,
at least it's simple.
But there is a bigger concern which needs to be addressed, and that is
our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
on the upstream port, and a similar thing happens on deletion:
dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
upstream port.
When there are 2 VLAN-unaware bridges spanning the same switch (which is
a use case DSA proudly supports), each bridge will install its own
SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
user ports enslaved to br0 and the user ports enslaved to br1. Not good.
The host-trapped multicast addresses installed by br1 will be deleted
when any state changes in br0 (IGMP timers expire, or ports leave, etc).
To avoid this, we could of course go the route of the zero-sum game and
delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
design is to just admit that on shared ports like DSA links and CPU
ports, we should be reference counting calls, even if this consumes some
dynamic memory which DSA has traditionally avoided. On the flip side,
the hardware tables of switches are limited in size, so it would be good
if the OS managed them properly instead of having them eventually
overflow.
To address the memory usage concern, we only apply the refcounting of
MDB entries on ports that are really shared (CPU ports and DSA links)
and not on user ports. In a typical single-switch setup, this means only
the CPU port (and the host MDB entries are not that many, really).
The name of the newly introduced data structures (dsa_mac_addr) is
chosen in such a way that will be reusable for host FDB entries (next
patch).
With this change, we can finally have the same matching logic for the
MDB additions and deletions, as well as for their host-trapped variants.
Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
include/net/dsa.h | 12 ++++++
net/dsa/dsa2.c | 8 ++++
net/dsa/switch.c | 104 ++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 115 insertions(+), 9 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 5f632cfd33c7..2c50546f9667 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -285,6 +285,11 @@ struct dsa_port {
*/
const struct dsa_netdevice_ops *netdev_ops;
+ /* List of MAC addresses that must be forwarded on this port.
+ * These are only valid on CPU ports and DSA links.
+ */
+ struct list_head mdbs;
+
bool setup;
};
@@ -299,6 +304,13 @@ struct dsa_link {
struct list_head list;
};
+struct dsa_mac_addr {
+ unsigned char addr[ETH_ALEN];
+ u16 vid;
+ refcount_t refcount;
+ struct list_head list;
+};
+
struct dsa_switch {
bool setup;
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 9000a8c84baf..2035d132682f 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -348,6 +348,8 @@ static int dsa_port_setup(struct dsa_port *dp)
if (dp->setup)
return 0;
+ INIT_LIST_HEAD(&dp->mdbs);
+
switch (dp->type) {
case DSA_PORT_TYPE_UNUSED:
dsa_port_disable(dp);
@@ -443,6 +445,7 @@ static int dsa_port_devlink_setup(struct dsa_port *dp)
static void dsa_port_teardown(struct dsa_port *dp)
{
struct devlink_port *dlp = &dp->devlink_port;
+ struct dsa_mac_addr *a, *tmp;
if (!dp->setup)
return;
@@ -468,6 +471,11 @@ static void dsa_port_teardown(struct dsa_port *dp)
break;
}
+ list_for_each_entry_safe(a, tmp, &dp->mdbs, list) {
+ list_del(&a->list);
+ kfree(a);
+ }
+
dp->setup = false;
}
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 7c5fe60a3763..10602a6da5e3 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -178,6 +178,84 @@ static bool dsa_switch_host_address_match(struct dsa_switch *ds, int port,
return false;
}
+static struct dsa_mac_addr *dsa_mac_addr_find(struct list_head *addr_list,
+ const unsigned char *addr,
+ u16 vid)
+{
+ struct dsa_mac_addr *a;
+
+ list_for_each_entry(a, addr_list, list)
+ if (ether_addr_equal(a->addr, addr) && a->vid == vid)
+ return a;
+
+ return NULL;
+}
+
+static int dsa_switch_do_mdb_add(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_mdb *mdb)
+{
+ struct dsa_port *dp = dsa_to_port(ds, port);
+ struct dsa_mac_addr *a;
+ int err;
+
+ /* No need to bother with refcounting for user ports */
+ if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
+ return ds->ops->port_mdb_add(ds, port, mdb);
+
+ a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
+ if (a) {
+ refcount_inc(&a->refcount);
+ return 0;
+ }
+
+ a = kzalloc(sizeof(*a), GFP_KERNEL);
+ if (!a)
+ return -ENOMEM;
+
+ err = ds->ops->port_mdb_add(ds, port, mdb);
+ if (err) {
+ kfree(a);
+ return err;
+ }
+
+ ether_addr_copy(a->addr, mdb->addr);
+ a->vid = mdb->vid;
+ refcount_set(&a->refcount, 1);
+ list_add_tail(&a->list, &dp->mdbs);
+
+ return 0;
+}
+
+static int dsa_switch_do_mdb_del(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_mdb *mdb)
+{
+ struct dsa_port *dp = dsa_to_port(ds, port);
+ struct dsa_mac_addr *a;
+ int err;
+
+ /* No need to bother with refcounting for user ports */
+ if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
+ return ds->ops->port_mdb_del(ds, port, mdb);
+
+ a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
+ if (!a)
+ return -ENOENT;
+
+ if (!refcount_dec_and_test(&a->refcount))
+ return 0;
+
+ err = ds->ops->port_mdb_del(ds, port, mdb);
+ if (err) {
+ refcount_inc(&a->refcount);
+ return err;
+ }
+
+ list_del(&a->list);
+ kfree(a);
+
+ return 0;
+}
+
static int dsa_switch_fdb_add(struct dsa_switch *ds,
struct dsa_notifier_fdb_info *info)
{
@@ -267,19 +345,18 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
if (!ds->ops->port_mdb_add)
return -EOPNOTSUPP;
- return ds->ops->port_mdb_add(ds, port, info->mdb);
+ return dsa_switch_do_mdb_add(ds, port, info->mdb);
}
static int dsa_switch_mdb_del(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
+ int port = dsa_towards_port(ds, info->sw_index, info->port);
+
if (!ds->ops->port_mdb_del)
return -EOPNOTSUPP;
- if (ds->index == info->sw_index)
- return ds->ops->port_mdb_del(ds, info->port, info->mdb);
-
- return 0;
+ return dsa_switch_do_mdb_del(ds, port, info->mdb);
}
static int dsa_switch_host_mdb_add(struct dsa_switch *ds,
@@ -294,7 +371,7 @@ static int dsa_switch_host_mdb_add(struct dsa_switch *ds,
for (port = 0; port < ds->num_ports; port++) {
if (dsa_switch_host_address_match(ds, port, info->sw_index,
info->port)) {
- err = ds->ops->port_mdb_add(ds, port, info->mdb);
+ err = dsa_switch_do_mdb_add(ds, port, info->mdb);
if (err)
break;
}
@@ -306,13 +383,22 @@ static int dsa_switch_host_mdb_add(struct dsa_switch *ds,
static int dsa_switch_host_mdb_del(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
+ int err = 0;
+ int port;
+
if (!ds->ops->port_mdb_del)
return -EOPNOTSUPP;
- if (ds->index == info->sw_index)
- return ds->ops->port_mdb_del(ds, info->port, info->mdb);
+ for (port = 0; port < ds->num_ports; port++) {
+ if (dsa_switch_host_address_match(ds, port, info->sw_index,
+ info->port)) {
+ err = dsa_switch_do_mdb_del(ds, port, info->mdb);
+ if (err)
+ break;
+ }
+ }
- return 0;
+ return err;
}
static bool dsa_switch_vlan_match(struct dsa_switch *ds, int port,
--
2.25.1
Powered by blists - more mailing lists