lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 17 Oct 2012 12:29:48 +0200
From:	Jiri Pirko <jiri@...nulli.us>
To:	Jay Vosburgh <fubar@...ibm.com>
Cc:	Jon Stanley <jstanley@...f.net>, netdev@...r.kernel.org,
	Andy Gospodarek <andy@...yhouse.net>, davem@...emloft.net,
	kaber@...sh.net
Subject: Re: regression between v3.0 and v3.3 in bringing up IPoIB devices in
 a bond at boot

Wed, Oct 17, 2012 at 06:58:17AM CEST, fubar@...ibm.com wrote:
>Jon Stanley <jstanley@...f.net> wrote:
>
>>Firstly, I apologize that this report isn't as well-formed as it
>>should be (i.e. I can't point to a specific commit that broke it), but
>>I'll do my best failing that. The reason I can't be as specific as I'd
>>like is that starting at 3.1-rc1, I'd run into the original VLAN
>>problem that I was having, and the patchset required to fix it (the
>>patch just submitted, efc73f4b, and 9b361c1) won't apply cleanly to
>>older kernels and I'm not sure it's worth the (seemingly massive)
>>effort required to backport them to something I don't plan to use :)
>>
>>First, let me explain the configuration a little:
>>
>>bond0 -> eth0, eth1.100
>>bond1 -> ib0, ib1
>>
>>The first problem, which is now resolved, was that ib0 and ib1
>>wouldn't enslave at all because of the presence of VLAN0. I can now
>>get them to enslave manually (echo +ib0 > /sys/.../slaves) *if* the
>>bond is down. If the bond is up, I still get the "refused to change
>>device type", which is fairly expected (I think, but that could be the
>>problem here).
>>
>>However, if I leave the distro (stock RHEL except the kernel) to it's
>>own devices to bring it up at boot, I get the same symptoms as though
>>the master (bond1 in this case) isn't down ("refused to change device
>>type"). If  I go back to v3.0, everything works fine. I've also
>>verified that it doesn't work on current mainline, and that it works
>>fine without any VLAN devices configured on bond0.
>
>	I haven't debugged on a live kernel yet to verify this, but I
>think I see what may be happening.  Basically, when the device is up,
>VID 0 has been added, and the VLAN code refuses to change type on a
>device with a VLAN configured (even VID 0).
>
>	If the bond is down (really, has never been up) when the
>NETDEV_PRE_TYPE_CHANGE event happens, the vlan_device_event callback
>will exit, because dev->vlan_info is NULL (dev here is the bond).  It's
>NULL because vlan_device_event will respond to a NETDEV_UP event by
>adding VID 0 to the device, which is what allocates the dev->vlan_info,
>so when the bond has never been up, there is no dev->vlan_info.
>
>	Once the bond is up, when the NETDEV_PRE_TYPE_CHANGE event
>reaches vlan_device_event, it will not exit as before, but proceed into
>the switch, and the NETDEV_PRE_TYPE_CHANGE case always returns
>NOTIFY_BAD, which results in the "refused to change type" message.
>
>	It (in theory) works with no VLANs on bond0 because then I'll
>guess that you have no VLANs at all, and therefore the 8021q module
>isn't loaded, and so the whole "VID 0" and vlan_device_event business
>doesn't take place.
>
>	I think that if vlan_device_event instead uses the new &
>improved vlan_uses_dev() test instead of its current test, that may
>resolve the problem; here's a patch against current net that I have not
>tested at all, but may fix things.  I'm not 100% sure this is the right
>thing to do, as it may result in IPoIB interfaces with VID 0 configured
>on them.
>
>diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>index 9096bcb..57170fa 100644
>--- a/net/8021q/vlan.c
>+++ b/net/8021q/vlan.c
>@@ -342,7 +342,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
> 	}
> 
> 	vlan_info = rtnl_dereference(dev->vlan_info);
>-	if (!vlan_info)
>+	if (!vlan_info || !vlan_info->grp.nr_vlan_devs)



This patch would prevent implicit vlan0 from removal. I'll rather put
vlan_uses_dev() check into NETDEV_PRE_TYPE_CHANGE case.

> 		goto out;
> 	grp = &vlan_info->grp;
> 
>
>	Comments?
>
>	-J
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ