netdev - Re: [PATCH net] bonding: fix arp requests sends with isolated routes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4562.1392685671@death.nxdomain>
Date:	Mon, 17 Feb 2014 17:07:51 -0800
From:	Jay Vosburgh <fubar@...ibm.com>
To:	David Miller <davem@...emloft.net>
cc:	f.cachereul@...halink.fr, vfalico@...hat.com, andy@...yhouse.net,
	netdev@...r.kernel.org
Subject: Re: [PATCH net] bonding: fix arp requests sends with isolated routes

David Miller <davem@...emloft.net> wrote:

>From: François Cachereul <f.cachereul@...halink.fr>
>Date: Fri, 14 Feb 2014 16:59:23 +0100
>
>> Make arp_send_all() try to send arp packets through slave devices event
>> if no route to arp_ip_target is found. This is useful when the route
>> is in an isolated routing table with routing rule parameters like oif or
>> iif in which case ip_route_output() return an error.
>> Thus, the arp packet is send without vlan and with the bond ip address
>> as sender.
>> 
>> Signed-off-by: François CACHEREUL <f.cachereul@...halink.fr>
>> ---
>> This previously worked, the problem was added in 2.6.35 with vlan 0
>> added by default when the module 8021q is loaded. Before that no route
>> lookup was done if the bond device did not have any vlan. The problem
>> now exists event if the module 8021q is not loaded.
>
>I don't like this at all, you're trying to paper over the fact that we
>can't set the flow key correctly at this point.
>
>Just assuming the route might be there and trying anyways is not really
>acceptable in my opinion.  There's a reason we do a route lookup at all.

	The reason for the route lookup is to get a VLAN ID for the
outgoing ARP (if VLANs are configured above the bond), so it can be
correctly tagged.

	As Francois says, older versions of the bond_arp_send_all
function would skip the route lookup entirely if there were no VLANs
configured above the bond.  E.g., the original logic from a 2.6.32-era
kernel looks like:

	for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) {
[...]
		if (!bond->vlgrp) {
			pr_debug("basa: empty vlan: arp_send\n");
			bond_arp_send(slave->dev, ARPOP_REQUEST, targets[i],
				      bond->master_ip, 0);
			continue;
		}

		/*
		 * If VLANs are configured, we do a route lookup to
		 * determine which VLAN interface would be used, so we
		 * can tag the ARP with the proper VLAN tag.
		 */
		memset(&fl, 0, sizeof(fl));
		fl.fl4_dst = targets[i];
		fl.fl4_tos = RTO_ONLINK;

		rv = ip_route_output_key(&init_net, &rt, &fl);
[...]

	So, in the past, this particular case (oif / iif in route
selection) would "work," in the sense that an ARP would go out with no
VLAN ID, but only when there were known to be no VLANs configured above
the bond.  If any VLANs were configured above the bond, this case would
fail as we're seeing here.

	Nowadays, there is no easy way to tell if there are VLANs above
the bond, and there's generally a VID 0 configured anyway, so the route
lookup is unconditional.  In the case at issue here (the route lookup
for the arp_ip_target IP address fails), it's not possible for bonding
to determine what interface would be used, and therefore what VLAN tag
to use.

	Francois's patch would make bonding essentially take a best
guess of "no VLAN" and send an untagged ARP for any destination not
found in the regular (no iif, oif, etc, rule) routing table, which is
what used to happen for the "known no VLAN" case.

	With the patch, these ARPs may have an all-zero source IP
address (since the bond_confirm_addr call may not find a suitable source
address for something it can't find a route to).  That is a legal ARP
(used for duplicate address detection according to RFC 2131), but when
last I tried it a couple of years ago, the replies won't pass
arp_validate (as the target IP of 0.0.0.0 in the reply doesn't match any
of the bond's IP address), and I suspect that hasn't changed.

	In the days of yore code above, bonding kept track of what it
thought the bond's IP address was (bond->master_ip), and used that as
the source IP in the ARPs.  That wasn't always correct if the bond had
multiple IP addresses.

	So, ultimately, Francois is correct that this is a regression of
a behavior that used to work.  On the other hand, this patch isn't
really a complete restoration of the prior behavior.  It's no longer
possible to know that there aren't any VLANs above the bond, and so the
"no VLAN" guess is much less reliable than it used to be, plus the ARPs
that will be generated probably won't work with arp_validate.

	As much as I loathe adding more options to bonding, a manually
selected "force VLAN ID" for the arp_ip_target(s) would resolve this for
the minority of cases where the automatic VLAN ID selection does not
function.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html