[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111024141113.GM2273@linux.vnet.ibm.com>
Date: Mon, 24 Oct 2011 07:11:13 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Benjamin Poirier <benjamin.poirier@...il.com>
Cc: Jiri Pirko <jpirko@...hat.com>, netdev@...r.kernel.org,
davem@...emloft.net, eric.dumazet@...il.com,
bhutchings@...arflare.com, shemminger@...tta.com, fubar@...ibm.com,
andy@...yhouse.net, tgraf@...radead.org, ebiederm@...ssion.com,
mirqus@...il.com, kaber@...sh.net, greearb@...delatech.com,
jesse@...ira.com, fbl@...hat.com, jzupka@...hat.com,
Dipankar Sarma <dipankar@...ibm.com>
Subject: Re: [patch net-next V4] net: introduce ethernet teaming device
On Mon, Oct 24, 2011 at 09:09:19AM -0400, Benjamin Poirier wrote:
> On 11/10/24 10:13, Jiri Pirko wrote:
> > This patch introduces new network device called team. It supposes to be
> > very fast, simple, userspace-driven alternative to existing bonding
> > driver.
> >
> > Userspace library called libteam with couple of demo apps is available
> > here:
> > https://github.com/jpirko/libteam
> > Note it's still in its dipers atm.
> >
> > team<->libteam use generic netlink for communication. That and rtnl
> > suppose to be the only way to configure team device, no sysfs etc.
> >
> > Python binding basis for libteam was recently introduced (some need
> > still need to be done on it though). Daemon providing arpmon/miimon
> > active-backup functionality will be introduced shortly.
> > All what's necessary is already implemented in kernel team driver.
> >
> > Signed-off-by: Jiri Pirko <jpirko@...hat.com>
> >
> > v3->v4:
> > - remove redundant synchronize_rcu from __team_change_mode()
> > - revert "set and clear of mode_ops happens per pointer, not per
> > byte"
> > - extend comment of function __team_change_mode()
> >
> > v2->v3:
> > - team_change_mtu() user rcu version of list traversal to unwind
> > - set and clear of mode_ops happens per pointer, not per byte
> > - port hashlist changed to be embedded into team structure
> > - error branch in team_port_enter() does cleanup now
> > - fixed rtln->rtnl
> >
> > v1->v2:
> > - modes are made as modules. Makes team more modular and
> > extendable.
> > - several commenters' nitpicks found on v1 were fixed
> > - several other bugs were fixed.
> > - note I ignored Eric's comment about roundrobin port selector
> > as Eric's way may be easily implemented as another mode (mode
> > "random") in future.
> > ---
> > Documentation/networking/team.txt | 2 +
> > MAINTAINERS | 7 +
> > drivers/net/Kconfig | 2 +
> > drivers/net/Makefile | 1 +
> > drivers/net/team/Kconfig | 38 +
> > drivers/net/team/Makefile | 7 +
> > drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
> > drivers/net/team/team_mode_activebackup.c | 152 +++
> > drivers/net/team/team_mode_roundrobin.c | 107 ++
> > include/linux/Kbuild | 1 +
> > include/linux/if.h | 1 +
> > include/linux/if_team.h | 231 +++++
> > include/linux/rculist.h | 14 +
>
> I think you're missing some CC's for the modifications to this file.
> I've taken the liberty of adding Dipankar and Paul to the discussion.
Thank you, and please see below.
> > 13 files changed, 2136 insertions(+), 0 deletions(-)
> > create mode 100644 Documentation/networking/team.txt
> > create mode 100644 drivers/net/team/Kconfig
> > create mode 100644 drivers/net/team/Makefile
> > create mode 100644 drivers/net/team/team.c
> > create mode 100644 drivers/net/team/team_mode_activebackup.c
> > create mode 100644 drivers/net/team/team_mode_roundrobin.c
> > create mode 100644 include/linux/if_team.h
> >
>
> [...]
>
> > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> > new file mode 100644
> > index 0000000..acfef4c
> > --- /dev/null
> > +++ b/drivers/net/team/team.c
> > +
> [...]
> > +static int team_change_mtu(struct net_device *dev, int new_mtu)
> > +{
> > + struct team *team = netdev_priv(dev);
> > + struct team_port *port;
> > + int err;
> > +
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(port, &team->port_list, list) {
> > + err = dev_set_mtu(port->dev, new_mtu);
> > + if (err) {
> > + netdev_err(dev, "Device %s failed to change mtu",
> > + port->dev->name);
> > + goto unwind;
> > + }
> > + }
> > + rcu_read_unlock();
> > +
> > + dev->mtu = new_mtu;
> > +
> > + return 0;
> > +
> > +unwind:
> > + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
> > + dev_set_mtu(port->dev, dev->mtu);
> > +
> > + rcu_read_unlock();
> > + return err;
> > +}
> > +
> > +
>
> [...]
>
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index d079290..7586b2c 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
> > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> >
> > /**
> > + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
> > + * @pos: the type * to use as a loop cursor.
> > + * @head: the head for your list.
> > + * @member: the name of the list_struct within the struct.
> > + *
> > + * Start to iterate over list of given type backwards, continuing after
> > + * the current position.
> > + */
> > +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
> > + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
> > + &pos->member != (head); \
> > + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
> > +
>
> rcu lists can be modified while they are traversed with *_rcu()
> primitives. This benefit comes with the constraint that they may only be
> traversed forwards. This is implicit in the choice of *_rcu()
> list-traversal primitives: they only go forwards.
>
> You suggest to add a backwards rcu list-traversal primitive. But
> consider what happens in this sequence:
>
> CPU0 CPU1
> list_for_each_entry_continue_reverse_rcu(...)
> pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> list_del_rcu(&pos->member)
> { (&pos->member)->prev = LIST_POISON2 }
> pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> = container_of(LIST_POISON2, typeof(*pos), member)
> do_something(*pos)
> BAM!
>
> Going back to the problem you're trying to solve in team_change_mtu(),
> I think you could either:
> 1) take team->lock instead of rcu_read_lock() throughout this particular
> function
> 2) save each deleted element in a separate list on the side in case it's
> necessary to roll back
> 3) remove the rcu double locking, rely on rtnl and add some
> ASSERT_RTNL() if desired. You've said that you don't want to rely on
> rtnl and you want to use separate locking but I fail to see what
> advantage that brings to balance out the extra complexity in code and
> execution? Please clarify this.
Indeed -- the list_for_each_entry_continue_reverse_rcu() implementation
above would only work if elements were never deleted from the list.
But people would miss that fact, resulting in list-poison oopses.
Furthermore, even if you avoid the poisoning, there is no guarantee
that you will see the same objects in reverse that you saw going
forward because some might have been added or deleted in the meantime.
So please take some other approach. For example, if the list has a
fixed upper bound, perhaps just keeping track of what elements you
visited would be workable.
Thanx, Paul
> Thanks,
> -Ben
>
> > +/**
> > * hlist_del_rcu - deletes entry from hash list without re-initialization
> > * @n: the element to delete from the hash list.
> > *
> > --
> > 1.7.6
> >
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists