[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201201013706.6clgrx2tnapywgxf@skbuf>
Date: Tue, 1 Dec 2020 03:37:06 +0200
From: Vladimir Oltean <olteanv@...il.com>
To: Tobias Waldekranz <tobias@...dekranz.com>
Cc: davem@...emloft.net, kuba@...nel.org, andrew@...n.ch,
vivien.didelot@...il.com, f.fainelli@...il.com,
j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
netdev@...r.kernel.org
Subject: Re: [PATCH v2 net-next 2/4] net: dsa: Link aggregation support
Hi Tobias,
On Mon, Nov 30, 2020 at 03:06:08PM +0100, Tobias Waldekranz wrote:
> Monitor the following events and notify the driver when:
>
> - A DSA port joins/leaves a LAG.
> - A LAG, made up of DSA ports, joins/leaves a bridge.
> - A DSA port in a LAG is enabled/disabled (enabled meaning
> "distributing" in 802.3ad LACP terms).
>
> Each LAG interface to which a DSA port is attached is represented by a
> `struct dsa_lag` which is globally reachable from the switch tree and
> from each associated port.
>
> When a LAG joins a bridge, the DSA subsystem will treat that as each
> individual port joining the bridge. The driver may look at the port's
> LAG pointer to see if it is associated with any LAG, if that is
> required. This is analogue to how switchdev events are replicated out
> to all lower devices when reaching e.g. a LAG.
>
> Signed-off-by: Tobias Waldekranz <tobias@...dekranz.com>
> ---
> include/net/dsa.h | 97 +++++++++++++++++++++++++++++++
> net/dsa/dsa2.c | 51 ++++++++++++++++
> net/dsa/dsa_priv.h | 31 ++++++++++
> net/dsa/port.c | 141 +++++++++++++++++++++++++++++++++++++++++++++
> net/dsa/slave.c | 83 ++++++++++++++++++++++++--
> net/dsa/switch.c | 49 ++++++++++++++++
> 6 files changed, 446 insertions(+), 6 deletions(-)
>
> +static inline struct dsa_lag *dsa_lag_by_id(struct dsa_switch_tree *dst, int id)
> +{
> + if (!test_bit(id, dst->lags.busy))
> + return NULL;
> +
> + return &dst->lags.pool[id];
> +}
> +
> +static inline struct net_device *dsa_lag_dev_by_id(struct dsa_switch_tree *dst,
> + int id)
> +{
> + struct dsa_lag *lag = dsa_lag_by_id(dst, id);
> +
> + return lag ? rcu_dereference(lag->dev) : NULL;
> +}
> +
> +static inline struct dsa_lag *dsa_lag_by_dev(struct dsa_switch_tree *dst,
> + struct net_device *dev)
> +{
> + struct dsa_lag *lag;
> + int id;
> +
> + dsa_lag_foreach(id, dst) {
> + lag = dsa_lag_by_id(dst, id);
> +
> + if (rtnl_dereference(lag->dev) == dev)
> + return lag;
> + }
> +
> + return NULL;
> +}
>
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index 73569c9af3cc..c2332ee5f5c7 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -193,6 +193,147 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
> dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
> }
>
> +static struct dsa_lag *dsa_lag_get(struct dsa_switch_tree *dst,
> + struct net_device *dev)
> +{
> + struct dsa_lag *lag;
> + int id;
> +
> + lag = dsa_lag_by_dev(dst, dev);
> + if (lag) {
> + kref_get(&lag->refcount);
> + return lag;
> + }
> +
> + id = find_first_zero_bit(dst->lags.busy, dst->lags.num);
> + if (id >= dst->lags.num) {
> + WARN(1, "No LAGs available");
> + return NULL;
> + }
> +
> + set_bit(id, dst->lags.busy);
> +
> + lag = &dst->lags.pool[id];
> + kref_init(&lag->refcount);
> + lag->id = id;
> + INIT_LIST_HEAD(&lag->ports);
> + INIT_LIST_HEAD(&lag->tx_ports);
> +
> + rcu_assign_pointer(lag->dev, dev);
> + return lag;
> +}
> +
> +static void dsa_lag_release(struct kref *refcount)
> +{
> + struct dsa_lag *lag = container_of(refcount, struct dsa_lag, refcount);
> +
> + rcu_assign_pointer(lag->dev, NULL);
> + synchronize_rcu();
> + memset(lag, 0, sizeof(*lag));
> +}
What difference does it make if lag->dev is set to NULL right away or
after a grace period? Squeezing one last packet from that bonding interface?
Pointer updates are atomic operations on all architectures that the
kernel supports, and, as long as you use WRITE_ONCE and READ_ONCE memory
barriers, there should be no reason for RCU protection that I can see.
And unlike typical uses of RCU, you do not free lag->dev, because you do
not own lag->dev. Instead, the bonding interface pointed to by lag->dev
is going to be freed (in case of a deletion using ip link) after an RCU
grace period anyway. And the receive data path is under an RCU read-side
critical section anyway. So even if you set lag->dev to NULL using
WRITE_ONCE, the existing in-flight readers from the RX data path that
had called dsa_lag_dev_by_id() will still hold a reference to a valid
bonding interface.
Powered by blists - more mailing lists