[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 04 May 2010 22:35:58 -0700
From: John Fastabend <john.r.fastabend@...el.com>
To: Jay Vosburgh <fubar@...ibm.com>
CC: "Leech, Christopher" <christopher.leech@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Andy Gospodarek <andy@...yhouse.net>,
Patrick McHardy <kaber@...sh.net>,
"bonding-devel@...ts.sourceforge.net"
<bonding-devel@...ts.sourceforge.net>
Subject: Re: Receive issues with bonding and vlans
Jay Vosburgh wrote:
> John Fastabend <john.r.fastabend@...el.com> wrote:
>
>> Jay Vosburgh wrote:
>>> John Fastabend <john.r.fastabend@...el.com> wrote:
>>>
> [...]
>>>> It should be OK to allow packets to be received on the VLAN if it is not
>>>> explicitly in the bond?
>>> Lemme see if I have this straight, because all of these special
>>> cases are making my brain hurt. This one is for a configuration like this:
>>>
>>> bond0 ----- eth0
>>> /
>>> vlan.xxx -/
>>>
>>> I.e., a VLAN configured directly atop an ethernet device, said
>>> ethernet also being a slave to bonding. Is that correct?
>>>
>> Yes, this is the correct scenario that we are considering.
>>
>>> Extrapolating from the ASCII art in a prior message in this
>>> discussion, would this configuration really be something like this:
>>>
>>> vlan.xxx -\
>>> \
>>> bond0 ----- eth1
>>> bond0 ----- eth0
>>> /
>>> vlan.xxx -/
>>>
>>> I.e., two slaves to bonding, with VLAN xxx configured atop both
>>> of the slaves? Or would the eth0 and eth1 use discrete VLAN ids? The
>>> reason I ask is in regards to duplicate suppression. The whole reason
>>> the "inactive" slave drops (most) incoming packets is to eliminate
>>> duplicates when the switch floods traffic to both slave ports.
>>>
>> These vlan ids could be the same or discrete I think both configurations
>> should be valid.
>>
>>> This is a bit tricky, because it's not really about broadcasts /
>>> multicasts so much, but about traffic that the switch sends to all ports
>>> because the switch's MAC address table isn't up to date with the
>>> destination MAC of the traffic (which is a transient condition, so this
>>> would only happen for perhaps one second or so). That would result in
>>> duplicate unicast packets being received by the bond (back in the day
>>> before bonding had the "drop inactive traffic" logic).
>>>
>>> So if the same VLAN is configured atop the two slaves, I wonder
>>> if that will open a window for the duplicate unicast packet problem.
>> OK, this does appear to open a window for duplicated unicast packets. By
>> only allowing handlers with exact matches at least this issue is less
>> obvious and we are assuming the packet handler can deal with this
>> duplication. This seems to be the current assumption made. The same issue
>> exists today for real device in the following setup,
>>
>> vlan --> bond0 --> eth
>
> I just tested this, and I'm not seeing duplicate packets using
> the test that used to show the problem before the "drop dups" logic went
> in (clear the switch's mac address-table, ping -c 25 -f [peer on VLAN],
> compare "packets transmitted" to "packets received").
>
> That doesn't mean there isn't a gap in the logic somewhere, just
> that the original problem hasn't resurfaced (as far as I can tell).
>
>> Specifically for FCoE we use the san mac address so it wouldn't be an
>> issue here. The expectation being that the switch will only ever use the
>> correct san mac on the port.
>
> The issue arises when the switch does not have the destination
> MAC in its address table, and as such is transitory, and only occurs
> after sufficiently long periods of no traffic (or a manual flush of the
> table). The packets are sent to all ports until the MAC table updates
> (which seems to take place asynchronously), which is usually about 1
> second or so (on the midrange Cisco gear I have).
>
> For example, with the switch's mac address table cleared, when
> starting a "ping -f" I can watch as first every port's light blinks,
> then all but two stop blinking. During the time that every port is
> blinking, the switch is sending all the packets to every port because
> the mac address table hasn't updated the switching logic (however that
> works under the covers).
>
>
>
>>> If the VLAN ids are different, then I'll assume this is some
>>> kind of device mapper magic, doing the load balancing elsewhere.
>> Correct device mapper handles load balancing and failover for both cases,
>> when the vlan ids are different and when they are the same.
>>
>>>> Or if we want to be more paranoid deliver packets only to handlers with
>>>> exact matches for the device. For non vlan devices we deliver skb's to
>>>> packet handlers that match exactly even on inactive slaves so doing this
>>>> on vlan devices as well makes sense and shouldn't cause any unexpected
>>>> problems.
>>> Yah, the whole concept of "inactive" slave is pretty mutated
>>> now; it's kind of become the "active-backup with semi-manual load
>>> balance for clever protocols, oh, and duplicate suppression" mode.
>>>
>>>> Also on a somewhat unrelated note I suspect null_or_orig and null_or_bond
>>>> are not working as expected in __netif_receive_skb(). At least the
>>>> comment 'deliver only exact match' could be inaccurate.
>>> I don't think this is unrelated at all. This code (the packet
>>> to device lookup stuff in __netif_receive_skb) has been modified pretty
>>> extensively lately for various bonding-related special cases, and I
>>> think it is getting hard to follow. Whatever comments are there need to
>>> be accurate, and, honestly, I think this code needs comments to explain
>>> what exactly is supposed to happen for these special cases.
>>>
>> Agreed. This should be cleaned up and some explanations added. The
>> current behavior in active-backup mode is receiving packets on the bonded
>> real device in active mode fails but putting that same real device in an
>> inactive state will cause it to receive packets. This is an
>> inconsistency, which should probably be fixed by initializing null_or_bond
>> to orig_dev. And also renaming it orig_or_bond at that point.
>>
>>>> Here's a patch to illustrate what I'm thinking compile tested only. If
>>>> this sounds reasonable I'll work up an official patch.
>>>>
>>>>
>>>> [PATCH] net: allow vlans on bonded real net_devices
>>>>
>>>> For converged I/O it is reasonable to use dm_multipathing to provice
>>>> failover and load balancing for storage traffic and then use bonding
>>>> for the LAN failover and load balancing.
>>>>
>>>> Currently this works if the multipathed devices are using the real
>>>> net_device and those devices are enslaved to a bonded device what
>>>> does not work is creating a vlan on the real device and trying to
>>>> use it outside the bond for multipathing.
>>>>
>>>> This patch adds logic so that if the skb is destined for a vlan
>>>> that is not in the bond the skb will not be dropped.
>>>>
>>>> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
>>>> ---
>>>>
>>>> net/8021q/vlan_core.c | 31 +++++++++++++++++++++----------
>>>> net/core/dev.c | 11 ++++++++---
>>>> 2 files changed, 29 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>>> index c584a0a..3bce0c3 100644
>>>> --- a/net/8021q/vlan_core.c
>>>> +++ b/net/8021q/vlan_core.c
>>>> @@ -8,18 +8,24 @@
>>>> int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
>>>> u16 vlan_tci, int polling)
>>>> {
>>>> + struct net_device *vlan_dev;
>>>> +
>>>> if (netpoll_rx(skb))
>>>> return NET_RX_DROP;
>>>>
>>>> - if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>>>> + vlan_dev = vlan_group_get_device(grp, vlan_tci & VLAN_VID_MASK);
>>>> +
>>>> + if (!vlan_dev)
>>>> + goto drop;
>>>> +
>>>> + if ((vlan_dev->priv_flags & IFF_BONDING ||
>>>> + vlan_dev_real_dev(vlan_dev)->flags & IFF_MASTER) &&
>>>> + skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>>> I'm not sure this will do the right thing if the VLAN device
>>> itself is a slave to bonding, e.g., bond0 ---> vlan.xxx ---> eth0. In
>>> that case, eth0's dev->master is NULL, and the vlan_dev (vlan.xxx's dev)
>>> doesn't have IFF_MASTER (but does have IFF_SLAVE and IFF_BONDING, I
>>> believe).
>>>
>> correct, vlan_dev does have IFF_BONDING and IFF_SLAVE here and doesn't
>> have IFF_MASTER.
>>
>>
>>> I think this will result in all incoming traffic being accepted
>>> on such a configuration (leading to duplicates, as described above).
>>>
>>> I suspect, but have not tested, that something like this might
>>> do what you're looking for:
>>>
>>> if ((vlan_dev->priv_flags & IFF_BONDING ||
>>> vlan_dev_real_dev(vlan_dev)->flags & (IFF_MASTER | IFF_SLAVE)) &&
>>> skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>>>
>>> I.e., if the VLAN device is either a MASTER (configured above
>>> the bond) or a slave (configured below the bond) do the duplicate
>>> suppresion.
>> Here are the three basic cases I see,
>>
>> #1. vlanx --> bond0 --> ethx
>>
>> In this case vlanx does not have IFF_BONDING set and real_dev is ethx with
>> IFF_SLAVE set. ethx has master dev->bond0 so this should work. And shows
>> why we need the IFF_SLAVE bit as you pointed out and I dropped.
>>
>> #2. bond --> vlanx --> ethx
>>
>> This case is broke, skb->dev->master is NULL so we would never drop this
>> pkt. As it exists today I suspect this is broken as well.
>
> In the VLAN pass, yes, but the VLAN input path will call into
> netif_receive_skb, and at that point the skb->dev is the vlan device,
> and it has a dev->master. I haven't tested this lately, but I'm fairly
> sure this works.
>
OK, these both seem to work as expected my test was invalid.
>> #3 bond0 --> ethx
>> vlanx --> -|
>>
>> Here is the case where adding the IFF_SLAVE bit doesn't work as I
>> hoped. We don't want to run skb_bond_should_drop here.
>
> Yes, this is tricky because the VLAN device will copy the
> dev->flags from the device it's placed atop, so the VLAN will inherit
> the ethx's IFF_SLAVE flag. This happens regardless of the setup order
> (enslave ethX, then add VLAN, or vice versa).
>
This doesn't appear to be true, adding a VLAN on ethx then enslave ethx
doesn't set the IFF_SLAVE flag on the VLAN. Unless I am missing something.
> I suspect this case may be testable because the VLAN device has
> IFF_SLAVE, but has no dev->master.
>
>> So I think there needs to be a bit of logic here to determine if we need
>> to check skb_bond_should_drop with the vlan device or with the
>> skb->dev->master. Something like might do:
>>
>> should_drop_dev = vlan_dev->master ? vlan_dev->master : skb->dev->master
>>
>> This should fix case #2 without breaking case #1. And the case I want to
>> allow is still not resolved. I'll think about this some more maybe this
>> logic can be fixed for all cases.
>
> As I said above, I don't think case #2 is really broken now.
>
Seems to be working sorry for the noise.
<<snip>>
>
> Hopefully this will be the last futzing around with this, and
> won't make it too complicated.
>
I currently believe the cleanest way to implement this is to add a
pkt_type flag PACKET_DROP to mark skbs that have been received on the
inactive slave. I sent out a functional RFC I would like to run a few
more tests on it, but otherwise I think it ok.
Thanks,
John.
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists