[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5303377A.4020405@alphalink.fr>
Date: Tue, 18 Feb 2014 11:35:38 +0100
From: François Cachereul <f.cachereul@...halink.fr>
To: Jay Vosburgh <fubar@...ibm.com>
CC: David Miller <davem@...emloft.net>, vfalico@...hat.com,
andy@...yhouse.net, netdev@...r.kernel.org
Subject: Re: [PATCH net] bonding: fix arp requests sends with isolated routes
Le 18/02/2014 02:07, Jay Vosburgh a écrit :
> David Miller <davem@...emloft.net> wrote:
>
>> From: François Cachereul <f.cachereul@...halink.fr>
>> Date: Fri, 14 Feb 2014 16:59:23 +0100
>>
>>> Make arp_send_all() try to send arp packets through slave devices event
>>> if no route to arp_ip_target is found. This is useful when the route
>>> is in an isolated routing table with routing rule parameters like oif or
>>> iif in which case ip_route_output() return an error.
>>> Thus, the arp packet is send without vlan and with the bond ip address
>>> as sender.
>>>
>>> Signed-off-by: François CACHEREUL <f.cachereul@...halink.fr>
>>> ---
>>> This previously worked, the problem was added in 2.6.35 with vlan 0
>>> added by default when the module 8021q is loaded. Before that no route
>>> lookup was done if the bond device did not have any vlan. The problem
>>> now exists event if the module 8021q is not loaded.
>>
>> I don't like this at all, you're trying to paper over the fact that we
>> can't set the flow key correctly at this point.
>>
>> Just assuming the route might be there and trying anyways is not really
>> acceptable in my opinion. There's a reason we do a route lookup at all.
>
> The reason for the route lookup is to get a VLAN ID for the
> outgoing ARP (if VLANs are configured above the bond), so it can be
> correctly tagged.
>
> As Francois says, older versions of the bond_arp_send_all
> function would skip the route lookup entirely if there were no VLANs
> configured above the bond. E.g., the original logic from a 2.6.32-era
> kernel looks like:
>
> for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) {
> [...]
> if (!bond->vlgrp) {
> pr_debug("basa: empty vlan: arp_send\n");
> bond_arp_send(slave->dev, ARPOP_REQUEST, targets[i],
> bond->master_ip, 0);
> continue;
> }
>
> /*
> * If VLANs are configured, we do a route lookup to
> * determine which VLAN interface would be used, so we
> * can tag the ARP with the proper VLAN tag.
> */
> memset(&fl, 0, sizeof(fl));
> fl.fl4_dst = targets[i];
> fl.fl4_tos = RTO_ONLINK;
>
> rv = ip_route_output_key(&init_net, &rt, &fl);
> [...]
>
> So, in the past, this particular case (oif / iif in route
> selection) would "work," in the sense that an ARP would go out with no
> VLAN ID, but only when there were known to be no VLANs configured above
> the bond. If any VLANs were configured above the bond, this case would
> fail as we're seeing here.
>
> Nowadays, there is no easy way to tell if there are VLANs above
> the bond, and there's generally a VID 0 configured anyway, so the route
> lookup is unconditional. In the case at issue here (the route lookup
> for the arp_ip_target IP address fails), it's not possible for bonding
> to determine what interface would be used, and therefore what VLAN tag
> to use.
>
> Francois's patch would make bonding essentially take a best
> guess of "no VLAN" and send an untagged ARP for any destination not
> found in the regular (no iif, oif, etc, rule) routing table, which is
> what used to happen for the "known no VLAN" case.
>
> With the patch, these ARPs may have an all-zero source IP
> address (since the bond_confirm_addr call may not find a suitable source
> address for something it can't find a route to). That is a legal ARP
> (used for duplicate address detection according to RFC 2131), but when
> last I tried it a couple of years ago, the replies won't pass
> arp_validate (as the target IP of 0.0.0.0 in the reply doesn't match any
> of the bond's IP address), and I suspect that hasn't changed.
This problem exists already when a route is found. Currently,
bond_confirm_addr() call at the end of bond_arp_send_all() may already
not find a suitable source address, if for example a route and its
source address are not in the same network.
>
> In the days of yore code above, bonding kept track of what it
> thought the bond's IP address was (bond->master_ip), and used that as
> the source IP in the ARPs. That wasn't always correct if the bond had
> multiple IP addresses.
>
> So, ultimately, Francois is correct that this is a regression of
> a behavior that used to work. On the other hand, this patch isn't
> really a complete restoration of the prior behavior. It's no longer
> possible to know that there aren't any VLANs above the bond, and so the
> "no VLAN" guess is much less reliable than it used to be, plus the ARPs
> that will be generated probably won't work with arp_validate.
>
> As much as I loathe adding more options to bonding, a manually
> selected "force VLAN ID" for the arp_ip_target(s) would resolve this for
> the minority of cases where the automatic VLAN ID selection does not
> function.
I had thought about adding an option but nothing I came with seemed good
enough. That's why I proposed this solution.
Maybe something like "ip_target:forced_vlan_id" per ip target for the
arp_ip_target option would do the trick. ':forced_vlan_id' would have to
be omitted for automatic VLAN ID selection or set to -1 for no vlan id
(0 doesn't seem a good idea as it's used for priority tagged frames).
This way we keep current behavior.
If you're ok with this, I'll submit another patch with the modified
option.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists