netdev - Re: [patch net-next-2.6] net: reinject arps into bonding slave instead of master

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D77FCC1.5050904@gmail.com>
Date:	Wed, 09 Mar 2011 23:18:41 +0100
From:	Nicolas de Pesloüan 
	<nicolas.2p.debian@...il.com>
To:	Jiri Pirko <jpirko@...hat.com>
CC:	Andy Gospodarek <andy@...yhouse.net>, netdev@...r.kernel.org,
	davem@...emloft.net, shemminger@...ux-foundation.org,
	kaber@...sh.net, fubar@...ibm.com, eric.dumazet@...il.com
Subject: Re: [patch net-next-2.6] net: reinject arps into bonding slave	instead
 of master

Le 09/03/2011 18:11, Jiri Pirko a écrit :
> Wed, Mar 09, 2011 at 04:28:34PM CET, nicolas.2p.debian@...il.com wrote:
>> Le 09/03/2011 16:09, Jiri Pirko a écrit :
>>> Wed, Mar 09, 2011 at 03:49:49PM CET, nicolas.2p.debian@...il.com wrote:
>>>> Le 09/03/2011 08:45, Jiri Pirko a écrit :
>>>>> Tue, Mar 08, 2011 at 10:44:37PM CET, nicolas.2p.debian@...il.com wrote:
>>>>>> Le 08/03/2011 14:42, Andy Gospodarek a écrit :
>>>>>>> I'm pretty sure this patch will have the same catastrophic problem your
>>>>>>> last one did.  By cloning and setting skb2->dev = orig_dev you just
>>>>>>> inserted a frame identical to the one we received right back into the
>>>>>>> stack.  It only took a few minutes for my box to melt as one frame on
>>>>>>> the wire will cause an infinite number of frames to be received by the
>>>>>>> stack.
>>>>>>
>>>>>> I agree with Andy. We still keep one reinject (netif_rx), which is
>>>>>> probably better that two (__netif_receive_skb), but not enough.
>>>>>>
>>>>>> I really think we need a general framework for late delivery of final
>>>>>> packets to packet handler registered somewhere in the rx_handler
>>>>>> path.
>>>>>>
>>>>>> Jiri, is this patch the one you announced as "I have some kind nice
>>>>>> solution in mind and I'm going to submit that as a patch later (too
>>>>>> many patches are in the wind atm)" ?
>>>>>
>>>>>
>>>>> I did not had time to verify my thought yet but I think that the only
>>>>> think needed against my original patch (bonding: move processing of recv
>>>>> handlers into handle_frame()) is ro remove vlan_on_bond_hook, period.
>>>>>
>>>>> Because all incoming arps are seen by bond_handle_frame =>
>>>>> bond->recv_probe , even vlan ones - that would make eth0-bond0-bond0.5
>>>>> work and eth0-bond0-br0-bond0.5 as well. But again, need to verify this.
>>>>
>>>> All incoming ARPs are seen by bond_handle_frame, even vlan ones, definitely true.
>>>>
>>>> But as some of them *are* vlan ones, you will have to untag those "by hand" inside bond_handle_frame.
>>>
>>
>>> Hmm. For hw vlan accel this would not be needed.
>>
>> Agreed.
>>
>>> But for non-hw-vlan-accel the frame is wrapped when going thru handle_frame. And yes, in that
>>> case untagging would be necessary. Unless the vlan untagging happening now in ptype_base handler
>>> is moved in rx path somewhere before __netif_receive_skb.
>>
>> Can't it me moved not before, but inside __netif_receive_skb, as a rx_handler?
>
> I don't think so - rx_handler is not right place to untag. rx_handler is
> per device. untaging should happen idealy on realdev skb injection. That
> way the rx path for hw-vlan-accel and non-he-vlan-accel would be very
> similar.

I don't understand your argument about "rx_handler is per device". A vlan interface is by design 
built on top a parent (physical or logical) interface. Why can't we use the rx_handler of the parent 
device to untag?

Untagging the frame very early (before __netif_receive_skb() would work if the value of skb->dev, at 
the time the frame enter __netif_receive_skb, is the parent interface for the vlan interface: eth0 
-> eth0.100. This is exactly what happens for hw-accel vlan frame and it sounds logical at first to 
do it the same way for non-hw-accel device.

But for all others setups, where there exist some net_devices before the "untagging" one, you would 
face some troubles. For example, with eth0+eth1 -> br0 -> br0.100, you cannot untag before entering 
__netif_receive_skb. If you do so, the bridge would receive untagged frame and if the frame is not 
for the local host, the bridge would forward an untagged frame while it is expected to forward a 
tagged one. Even if the bridge is in a position to know the frame *was* tagged, we cannot expect the 
bridge to do special processing to handle this situation. Doing so would break layering.

On another setup, eth0 -> eth0.100 -> br0 -> eth1.200 -> eth1, on the opposite, we expect the bridge 
to receive and to forward an untagged frame, that has been untagged while crossing eth0.100 and will 
be tagged while crossing eth1.200.

If we want to avoid reinjection as much as possible (and I advocate for that, you know :-)), then 
the only place to untag is in the rx_header for the parent device of the vlan device. I still don't 
understand the problem you see with this proposal.

And the only remaining discrepancy would be in hw-accel vs non-hw-accel. A protocol handler 
registered on eth0 -> eth0.100 would receive:
- a tagged frame for non-hw-accel.
- an untagged frame for hw-accel.

I think it is already true today, so I don't consider this a problem. We might arrange to fix this 
discrepancy later, if someone consider this a problem.

	Nicolas.

>> At the time we setup eth0.100, we can arrange for a vlan_untag
>> rx_handler to be registered on eth0. This is exactly the way it works
>> now for macvlan. Should it be possible (and desirable) for vlan too?
>>
>> This might re-open the discussion about the need for several
>> rx_handlers per interface, but that is another story.
>>
>> Also, moving it before __netif_receive_skb would cause every protocol
>> handlers to receive the frame untagged. At least protocol sniffers
>> registered at ptype_all may want to receive the tagged frame, for
>> diagnostic purpose.
>
> Thats how its done for hw-vlan-accel. No problem here.
>
>>
>> That would result in two things:
>>>
>>> 1) Bond would be able to scope vlan packets
>>> 2) The rx path for hw-vlan-accel and non-hw-vlan-accel would be the same
>>>     (should be desirable + one reinjection would be avoided for
>>>      non-hw-vlan-accel)
>>
>> Agreed, but moving it inside __netif_receive_skb should have the same
>> effects. If we move non-hw-accel-vlan to a rx_handler, the skb would
>> be COW before being untagged only if shared. This sounds good to me.
>>
>> 	Nicolas.
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html