[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111024135042.GA8113@minipsycho.brq.redhat.com>
Date: Mon, 24 Oct 2011 15:50:43 +0200
From: Jiri Pirko <jpirko@...hat.com>
To: Benjamin Poirier <benjamin.poirier@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net,
eric.dumazet@...il.com, bhutchings@...arflare.com,
shemminger@...tta.com, fubar@...ibm.com, andy@...yhouse.net,
tgraf@...radead.org, ebiederm@...ssion.com, mirqus@...il.com,
kaber@...sh.net, greearb@...delatech.com, jesse@...ira.com,
fbl@...hat.com, jzupka@...hat.com,
Dipankar Sarma <dipankar@...ibm.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [patch net-next V4] net: introduce ethernet teaming device
Mon, Oct 24, 2011 at 03:09:19PM CEST, benjamin.poirier@...il.com wrote:
>On 11/10/24 10:13, Jiri Pirko wrote:
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>>
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>>
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>>
>> Python binding basis for libteam was recently introduced (some need
>> still need to be done on it though). Daemon providing arpmon/miimon
>> active-backup functionality will be introduced shortly.
>> All what's necessary is already implemented in kernel team driver.
>>
>> Signed-off-by: Jiri Pirko <jpirko@...hat.com>
>>
>> v3->v4:
>> - remove redundant synchronize_rcu from __team_change_mode()
>> - revert "set and clear of mode_ops happens per pointer, not per
>> byte"
>> - extend comment of function __team_change_mode()
>>
>> v2->v3:
>> - team_change_mtu() user rcu version of list traversal to unwind
>> - set and clear of mode_ops happens per pointer, not per byte
>> - port hashlist changed to be embedded into team structure
>> - error branch in team_port_enter() does cleanup now
>> - fixed rtln->rtnl
>>
>> v1->v2:
>> - modes are made as modules. Makes team more modular and
>> extendable.
>> - several commenters' nitpicks found on v1 were fixed
>> - several other bugs were fixed.
>> - note I ignored Eric's comment about roundrobin port selector
>> as Eric's way may be easily implemented as another mode (mode
>> "random") in future.
>> ---
>> Documentation/networking/team.txt | 2 +
>> MAINTAINERS | 7 +
>> drivers/net/Kconfig | 2 +
>> drivers/net/Makefile | 1 +
>> drivers/net/team/Kconfig | 38 +
>> drivers/net/team/Makefile | 7 +
>> drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
>> drivers/net/team/team_mode_activebackup.c | 152 +++
>> drivers/net/team/team_mode_roundrobin.c | 107 ++
>> include/linux/Kbuild | 1 +
>> include/linux/if.h | 1 +
>> include/linux/if_team.h | 231 +++++
>> include/linux/rculist.h | 14 +
>
>I think you're missing some CC's for the modifications to this file.
>I've taken the liberty of adding Dipankar and Paul to the discussion.
>
>> 13 files changed, 2136 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/networking/team.txt
>> create mode 100644 drivers/net/team/Kconfig
>> create mode 100644 drivers/net/team/Makefile
>> create mode 100644 drivers/net/team/team.c
>> create mode 100644 drivers/net/team/team_mode_activebackup.c
>> create mode 100644 drivers/net/team/team_mode_roundrobin.c
>> create mode 100644 include/linux/if_team.h
>>
>
>[...]
>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> new file mode 100644
>> index 0000000..acfef4c
>> --- /dev/null
>> +++ b/drivers/net/team/team.c
>> +
>[...]
>> +static int team_change_mtu(struct net_device *dev, int new_mtu)
>> +{
>> + struct team *team = netdev_priv(dev);
>> + struct team_port *port;
>> + int err;
>> +
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(port, &team->port_list, list) {
>> + err = dev_set_mtu(port->dev, new_mtu);
>> + if (err) {
>> + netdev_err(dev, "Device %s failed to change mtu",
>> + port->dev->name);
>> + goto unwind;
>> + }
>> + }
>> + rcu_read_unlock();
>> +
>> + dev->mtu = new_mtu;
>> +
>> + return 0;
>> +
>> +unwind:
>> + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
>> + dev_set_mtu(port->dev, dev->mtu);
>> +
>> + rcu_read_unlock();
>> + return err;
>> +}
>> +
>> +
>
>[...]
>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index d079290..7586b2c 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
>> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>
>> /**
>> + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
>> + * @pos: the type * to use as a loop cursor.
>> + * @head: the head for your list.
>> + * @member: the name of the list_struct within the struct.
>> + *
>> + * Start to iterate over list of given type backwards, continuing after
>> + * the current position.
>> + */
>> +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
>> + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
>> + &pos->member != (head); \
>> + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
>> +
>
>rcu lists can be modified while they are traversed with *_rcu()
>primitives. This benefit comes with the constraint that they may only be
>traversed forwards. This is implicit in the choice of *_rcu()
>list-traversal primitives: they only go forwards.
>
>You suggest to add a backwards rcu list-traversal primitive. But
>consider what happens in this sequence:
>
>CPU0 CPU1
>list_for_each_entry_continue_reverse_rcu(...)
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> list_del_rcu(&pos->member)
> { (&pos->member)->prev = LIST_POISON2 }
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> = container_of(LIST_POISON2, typeof(*pos), member)
>do_something(*pos)
> BAM!
Doh, you are right. :(
>
>Going back to the problem you're trying to solve in team_change_mtu(),
>I think you could either:
>1) take team->lock instead of rcu_read_lock() throughout this particular
>function
>2) save each deleted element in a separate list on the side in case it's
>necessary to roll back
I would go with this one.
>3) remove the rcu double locking, rely on rtnl and add some
>ASSERT_RTNL() if desired. You've said that you don't want to rely on
>rtnl and you want to use separate locking but I fail to see what
>advantage that brings to balance out the extra complexity in code and
>execution? Please clarify this.
I do not want to use rtnl. For example in gennetlink code rtnl is not
held so I need to depend on own lock.
>
>Thanks,
>-Ben
>
>> +/**
>> * hlist_del_rcu - deletes entry from hash list without re-initialization
>> * @n: the element to delete from the hash list.
>> *
>> --
>> 1.7.6
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists