[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D6AB7BA.6090609@gmail.com>
Date:	Sun, 27 Feb 2011 21:44:42 +0100
From:	Nicolas de Pesloüan 
	<nicolas.2p.debian@...il.com>
To:	Jiri Pirko <jpirko@...hat.com>, Jay Vosburgh <fubar@...ibm.com>
CC:	David Miller <davem@...emloft.net>, kaber@...sh.net,
	eric.dumazet@...il.com, netdev@...r.kernel.org,
	shemminger@...ux-foundation.org, andy@...yhouse.net,
	"Fischer, Anna" <anna.fischer@...com>
Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
Le 27/02/2011 13:58, Jiri Pirko a écrit :
> Sat, Feb 26, 2011 at 08:42:57PM CET, fubar@...ibm.com wrote:
>> Nicolas de Pesloüan 	<nicolas.2p.debian@...il.com>  wrote:
>>> Hi Jay,
>>>
>>> Still thinking about this orig_dev stuff, I wonder why the protocol
>>> handlers used in bonding (bond_3ad_lacpdu_recv() and bond_arp_rcv()) are
>>> registered at the master level instead of at the slave level ?
>>>
>>> If they were registered at the slave level, they would simply receive
>>> skb->dev as the ingress interface and use this value instead of needing
>>> the orig_dev value given to them when they are registered at the master
>>> level.
>>>
>>> As orig_dev is only used by bonding and by af_packet, but they disagree on
>>> the exact meaning of orig_dev, one way to fix this discrepancy would be to
>>> remove one of the usage. As the af_packet usage is exposed to user space,
>>> bonding seems the right place to stop using orig_dev, even if orig_dev was
>>> introduced for bonding :-)
>>>
>>> I understand that this would add one entry per slave device to the
>>> ptype_base list, but this seems to be the only bad effect of registering
>>> at the slave level. Can you confirm that this was the reason to register
>>> at the master level instead?
>>
>> 	My recollection is that it was done the way it is because there
>> was no "orig_dev" delivery logic at the time.  A handler registered to a
>> slave dev would receive no packets at all because assignment of skb->dev
>> to the master happened first, and the "orig_dev" knowledge was lost.
>>
>> 	When 802.3ad was added, a skb->real_dev field was created, but
>> it wasn't used for delivery.  802.3ad used real_dev to figure out which
>> slave a LACPDU arrived on.  The skb->real_dev was eventually replaced
>> with the orig_dev business that's there now.
>>
>> 	Later, I did the arp_validate stuff the same way as 802.3ad
>> because it worked and was easier than registering a handler per slave.
>>
>>> If you think registering at the slave level would cause too much impact on
>>> ptype_base, then we might have another way to stop using orig_dev for
>>> bonding:
>>>
>>> In __skb_bond_should_drop(), we already test for the two interesting protocols:
>>>
>>> if ((dev->priv_flags&  IFF_SLAVE_NEEDARP)&&  skb->protocol == __cpu_to_be16(ETH_P_ARP))
>>> 	return 0;
>>>
>>> if (master->priv_flags&  IFF_MASTER_8023AD&&  skb->protocol == __cpu_to_be16(ETH_P_SLOW))
>>> 	return 0;
>>>
>>> Would it be possible to call the right handlers directly from inside
>>> __skb_bond_should_drop() then let __skb_bond_should_drop() return 1
>>> ("should drop") after processing the frames that are only of interest for
>>> bonding?
>>
>> 	Isn't one purpose of switching to rx_handler that there won't
>> need to be any skb_bond_should_drop logic in __netif_receive_skb at all?
>
> Yes, that (hopefully most)  would be eventually removed.
The skb_bond_should_drop logic was simply moved from dev.c to 
bond_should_deliver_exact_match@...d_main.c by Jiri's patch.
But the logic remain and is necessary to decide whether we do normal delivery or only exact match 
delivery.
>> 	Still, if you're just trying to simplify __netif_receive_skb
>> first, I don't see any reason not to register the packet handlers at the
>> slave level.  Looking at the ptype_base hash, I don't think that the
>> protocols bonding is registering (ARP and SLOW) will hash collide with
>> IP or IPv6, so I suspect there won't be much impact.
>>
>> 	Once an rx_handler is used, then I suspect there's no need for
>> the packet handlers at all, since the rx_handler is within bonding and
>> can just deal with the ARP or LACPDU directly.
>
> That is very true. And given that af_packet uses orig_dev to obtain
> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
> orig_dev parameter for good.
Unfortunately, after doing some more research, I'm afraid we won't be able to suppress at least the 
ARP packet handler:
In commit 1f3c8804acba841b5573b953f5560d2683d2db0d (bonding: allow arp_ip_targets on separate vlans 
to use arp validation), Andy solved the problem of vlan on top of bonding, when the arp_ip_target is 
on one of the vlans:
eth0/eth1 -> bond0 -> bond0.100
At the time the frame is inspected by bonding, the frame is still tagged. This is true for the new 
rx_handler proposed by Jiri, and is also true for the former __skb_bond_should_drop() handling).
To receive the untagged frame, we would have to wait until the vlan code remove the tag. The current 
protocol handler seems to be the best way to catch the frame that late.
This is probably specific to ARP. I don't think SLOW frames can be tagged.
Anyway, Jay, thanks for you clarification.
> So I suggest to take V3 of my patch now and do multiple follow-on
> patches to get us where we want to get.
Agreed.
	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists
 
