[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111024130918.GB24473@synalogic.ca>
Date: Mon, 24 Oct 2011 09:09:19 -0400
From: Benjamin Poirier <benjamin.poirier@...il.com>
To: Jiri Pirko <jpirko@...hat.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net,
eric.dumazet@...il.com, bhutchings@...arflare.com,
shemminger@...tta.com, fubar@...ibm.com, andy@...yhouse.net,
tgraf@...radead.org, ebiederm@...ssion.com, mirqus@...il.com,
kaber@...sh.net, greearb@...delatech.com, jesse@...ira.com,
fbl@...hat.com, jzupka@...hat.com,
Dipankar Sarma <dipankar@...ibm.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [patch net-next V4] net: introduce ethernet teaming device
On 11/10/24 10:13, Jiri Pirko wrote:
> This patch introduces new network device called team. It supposes to be
> very fast, simple, userspace-driven alternative to existing bonding
> driver.
>
> Userspace library called libteam with couple of demo apps is available
> here:
> https://github.com/jpirko/libteam
> Note it's still in its dipers atm.
>
> team<->libteam use generic netlink for communication. That and rtnl
> suppose to be the only way to configure team device, no sysfs etc.
>
> Python binding basis for libteam was recently introduced (some need
> still need to be done on it though). Daemon providing arpmon/miimon
> active-backup functionality will be introduced shortly.
> All what's necessary is already implemented in kernel team driver.
>
> Signed-off-by: Jiri Pirko <jpirko@...hat.com>
>
> v3->v4:
> - remove redundant synchronize_rcu from __team_change_mode()
> - revert "set and clear of mode_ops happens per pointer, not per
> byte"
> - extend comment of function __team_change_mode()
>
> v2->v3:
> - team_change_mtu() user rcu version of list traversal to unwind
> - set and clear of mode_ops happens per pointer, not per byte
> - port hashlist changed to be embedded into team structure
> - error branch in team_port_enter() does cleanup now
> - fixed rtln->rtnl
>
> v1->v2:
> - modes are made as modules. Makes team more modular and
> extendable.
> - several commenters' nitpicks found on v1 were fixed
> - several other bugs were fixed.
> - note I ignored Eric's comment about roundrobin port selector
> as Eric's way may be easily implemented as another mode (mode
> "random") in future.
> ---
> Documentation/networking/team.txt | 2 +
> MAINTAINERS | 7 +
> drivers/net/Kconfig | 2 +
> drivers/net/Makefile | 1 +
> drivers/net/team/Kconfig | 38 +
> drivers/net/team/Makefile | 7 +
> drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
> drivers/net/team/team_mode_activebackup.c | 152 +++
> drivers/net/team/team_mode_roundrobin.c | 107 ++
> include/linux/Kbuild | 1 +
> include/linux/if.h | 1 +
> include/linux/if_team.h | 231 +++++
> include/linux/rculist.h | 14 +
I think you're missing some CC's for the modifications to this file.
I've taken the liberty of adding Dipankar and Paul to the discussion.
> 13 files changed, 2136 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/networking/team.txt
> create mode 100644 drivers/net/team/Kconfig
> create mode 100644 drivers/net/team/Makefile
> create mode 100644 drivers/net/team/team.c
> create mode 100644 drivers/net/team/team_mode_activebackup.c
> create mode 100644 drivers/net/team/team_mode_roundrobin.c
> create mode 100644 include/linux/if_team.h
>
[...]
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> new file mode 100644
> index 0000000..acfef4c
> --- /dev/null
> +++ b/drivers/net/team/team.c
> +
[...]
> +static int team_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct team *team = netdev_priv(dev);
> + struct team_port *port;
> + int err;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(port, &team->port_list, list) {
> + err = dev_set_mtu(port->dev, new_mtu);
> + if (err) {
> + netdev_err(dev, "Device %s failed to change mtu",
> + port->dev->name);
> + goto unwind;
> + }
> + }
> + rcu_read_unlock();
> +
> + dev->mtu = new_mtu;
> +
> + return 0;
> +
> +unwind:
> + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
> + dev_set_mtu(port->dev, dev->mtu);
> +
> + rcu_read_unlock();
> + return err;
> +}
> +
> +
[...]
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index d079290..7586b2c 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
> + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
> + * @pos: the type * to use as a loop cursor.
> + * @head: the head for your list.
> + * @member: the name of the list_struct within the struct.
> + *
> + * Start to iterate over list of given type backwards, continuing after
> + * the current position.
> + */
> +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
> + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
> + &pos->member != (head); \
> + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
> +
rcu lists can be modified while they are traversed with *_rcu()
primitives. This benefit comes with the constraint that they may only be
traversed forwards. This is implicit in the choice of *_rcu()
list-traversal primitives: they only go forwards.
You suggest to add a backwards rcu list-traversal primitive. But
consider what happens in this sequence:
CPU0 CPU1
list_for_each_entry_continue_reverse_rcu(...)
pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
list_del_rcu(&pos->member)
{ (&pos->member)->prev = LIST_POISON2 }
pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
= container_of(LIST_POISON2, typeof(*pos), member)
do_something(*pos)
BAM!
Going back to the problem you're trying to solve in team_change_mtu(),
I think you could either:
1) take team->lock instead of rcu_read_lock() throughout this particular
function
2) save each deleted element in a separate list on the side in case it's
necessary to roll back
3) remove the rcu double locking, rely on rtnl and add some
ASSERT_RTNL() if desired. You've said that you don't want to rely on
rtnl and you want to use separate locking but I fail to see what
advantage that brings to balance out the extra complexity in code and
execution? Please clarify this.
Thanks,
-Ben
> +/**
> * hlist_del_rcu - deletes entry from hash list without re-initialization
> * @n: the element to delete from the hash list.
> *
> --
> 1.7.6
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists