netdev - Re: [patch net-next-2.6 V3] net: convert bonding to use rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D68E31E.7060807@gmail.com>
Date:	Sat, 26 Feb 2011 12:25:18 +0100
From:	Nicolas de Pesloüan 
	<nicolas.2p.debian@...il.com>
To:	Jiri Pirko <jpirko@...hat.com>
CC:	David Miller <davem@...emloft.net>, kaber@...sh.net,
	eric.dumazet@...il.com, netdev@...r.kernel.org,
	shemminger@...ux-foundation.org, fubar@...ibm.com,
	andy@...yhouse.net
Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler

Le 26/02/2011 08:14, Jiri Pirko a écrit :
> Sat, Feb 26, 2011 at 12:46:53AM CET, nicolas.2p.debian@...il.com wrote:
>> Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>> This patch converts bonding to use rx_handler. Results in cleaner
>>> __netif_receive_skb() with much less exceptions needed. Also
>>> bond-specific work is moved into bond code.
>>>
>>> Did performance test using pktgen and counting incoming packets by
>>> iptables. No regression noted.
>>>
>>> Signed-off-by: Jiri Pirko<jpirko@...hat.com>
>>>
>>> v1->v2:
>>>          using skb_iif instead of new input_dev to remember original
>>> 	device
>>>
>>> v2->v3:
>>> 	do another loop in case skb->dev is changed. That way orig_dev
>>> 	core can be left untouched.
>>
>> Hi Jiri,
>>
>> Eventually taking enough time for a review.
>>
>> I think we should split this change :
>>
>> 1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.
>>
>> 2/ Convert currently existing rx_handlers (bridge and macvlan) to use
>> this new "loop" feature, removing the need to call netif_rx() inside
>> their respective rx_handler and also removing the associated
>> overhead.
>
> This might not be possible. Macvlan uses result of called netif_rx for
> counting, bridge calls netdev_receive_skb via NF_HOOK. Nevertheless,
> this can be eventually handled later, not as a part of this patch.

Yes, I agree. Step 2 and step 3 can be swapped.

Anyway, we need to describe the options given to a rx_handler:

- Return skb unchanged. This would cause normal delivery (ptype->dev == NULL or ptype->dev == skb->dev).
- Return skb->dev changed. __netif_receive_skb() will loop to the new device. This would cause 
extact match delivery only (ptype->dev != NULL and ptype->dev == one of the orig_dev).
- Manage the skb another way and return NULL. This would stop any protocol handlers to receive the 
skb, except if the rx_handler arrange to re-inject the skb somewhere.

>> 3/ Convert bonding to use rx_handlers.
>>
>> Also, on step 1, we definitely need to clarify what orig_dev should be.
>>
>> I now think that orig_dev should be "the device one level below the
>> current one" or NULL if current device was not diverted from another
>> one. It means that we should keep an array of crossed (diverted)
>> devices and the associated orig_dev. This array would be used to pass
>> the right orig_dev to protocol handlers, depending on the device they
>> register on :
>
> I constructed the patch in the way origdev is the same in all situations
> as before the patch. I think that this decision can be ommitted at the
> moment.

Agreed, event if the current handling of orig_dev is far from bullet proof and needs to be clarified 
at some time.

>> eth0 ->  bond0 ->  br0
>>
>> A protocol handler registered on bond0 would receive eth0 as orig_dev.
>> A protocol handler registered on br0 would receive bond0 as orig_dev.
>>
>> [snip]
>>
>>> @@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>
>> [snip]
>>
>>> +another_round:
>>> +
>>> +	__this_cpu_inc(softnet_data.processed);
>>> +
>>>   #ifdef CONFIG_NET_CLS_ACT
>>>   	if (skb->tc_verd&   TC_NCLS) {
>>>   		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
>>> @@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   #endif
>>>
>>>   	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>> -		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
>>> -		    ptype->dev == orig_dev) {
>>> +		if (!ptype->dev || ptype->dev == skb->dev) {
>>>   			if (pt_prev)
>>>   				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>   			pt_prev = ptype;
>>> @@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   ncls:
>>>   #endif
>>>
>>
>> Why do you loop to ptype_all before calling rx_handler ?
>>
>> I don't understand why ptype_all and ptype_base are not handled at
>> the same place in current __netif_receive_skb() but I think we should
>> take the opportunity to change that, unless someone know of a good
>> reason not to do so.
>
> Again, the patch tries to do as little changes as it can. So this stays
> the same as before. In case you want to change it, feel free to submit
> patch doing that as follow-on.

The point here is that bridge and macvlan handling used to be after the ptype_all loop (hence the 
place you inserted the call to rx_handler last summer), but the bonding part is currently before the 
ptype_all loop.

Moving bonding handling after the ptype_all loop will cause the ptype_all loop to be run twice:
- first time, with skb->dev == eth0 and orig_dev == eth0.
- second time, with skb->dev == bond0 and orig_dev == eth0.

The first time currently does not exists. And because bonding wasn't given a chance yet to decide 
that the frame should be dropped, the packet will always be delivered to eth0, causing duplicate 
deliveries. Note that this is probably true for bridge and macvlan too, and that those duplicate 
deliveries probably already exists.

Also, delivering skb inside a loop that may change the skb (skb->dev at least) is guaranteed to 
produce strange behaviors.

Can someone, knowing the history of ptype_all/ptype_base/bridge/macvlan/bonding/vlan handling in 
__netif_receive_skb(), comment on this?

Are there any reasons not to process ptype_all and ptype_base at the same location, at the end of 
__netif_receive_skb(), and to manage all divert features (bridge/macvlan/bonding/vlan) before?

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html