[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1292540003-9465-1-git-send-email-fubar@us.ibm.com>
Date: Thu, 16 Dec 2010 14:53:21 -0800
From: Jay Vosburgh <fubar@...ibm.com>
To: netdev@...r.kernel.org
Cc: Andy Gospodarek <andy@...yhouse.net>
Subject: [RFC v2 PATCH 0/2] bonding: generic netlink, multi-link mode
[ New and Improved, includes all the files this time... -J ]
These patches add support to bonding for generic netlink and a new
multi-link mode. At the moment, I'm looking primarily for discussion
about the generic netlink and implementation of multi-link.
First, in patch 1, is a generic netlink infrastructure for
bonding. This patch provides a "get mode" command and a "slave link state
change" asychnronous notification via a netlink multicast group. One long
term goal is to have bonding be controlled via netlink, both for
administrative purposes (add / remove slaves, etc) and policy (slave A is
better than slave B). I'd appreciate feedback from netlink savvy folks as
to whether this is the appropriate starting point.
Second, in patch 2, is the multi-link kernel code itself, which is
at present a work in progress. Here, I'm primarily looking for comments
regarding the control interface for this mode.
As implemented, this is a new mode to bonding, controlled via
generic netlink commands from a user space daemon. Slave assignment for
outgoing traffic is handled directly by bonding (the mapping table used by
multi-link is within bonding itself, and the usual transmit hash policy is
applied to the set of slaves allowable for a given destination).
In some private discussion with Andy, he suggested that this would
be better if it utilized the recently added queue mapping facility within
bonding, and then having the queue (and thus slave) assignments performed
at the qdisc level (via a tc filter) instead of within bonding itself.
This, I believe, would require a new tc filter that implements the ability
to set a skb queue_mapping in a hash (of protocol data in the packet) or
round robin fashion. In this case, the tc filter would also incorporate
all of the netlink functionality for communicating with the user space
daemon (to permit the mappings to be updated).
Thoughts?
Lastly, a description of the multi-link system itself. This is a
reimplementation of a load balancing scheme that has been available on AIX
for some time. It operates essentially as a load balancer by subnet, with
a UDP-based protocol to exchange multi-link topology information between
participating systems. Hosts participating in multi-link have IP
addresses in a separate subnet. Interfaces enslaved to multi-link do not
lose their assigned IP address information, and may also operate
separately from multi-link.
One notable feature is that multi-link provides load balancing
facilities for network devices that cannot change their MAC address, such
as Infiniband.
For example, given two systems as follows:
host A:
bond0 10.88.0.1/16
slave eth0 10.0.0.1/16
slave eth1 10.1.0.1/16
slave eth2 10.2.0.1/16
host B:
bond0 10.88.0.2/16
slave eth0 10.0.0.2/16
slave eth1 10.1.0.2/16
slave eth2 10.2.0.2/16
in this case, host A's bond0 running multi-link would load balance
traffic from 10.88.0.1 to 10.88.0.2 across eth0, eth1 and eth2. The user
space daemon negotiates the link set to use with other participating
hosts, and communicates that to the multi-link implementation.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists