lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 20 Feb 2011 11:36:10 +0100 From: Jiri Pirko <jpirko@...hat.com> To: Nicolas de Pesloüan <nicolas.2p.debian@...il.com> Cc: Jay Vosburgh <fubar@...ibm.com>, David Miller <davem@...emloft.net>, kaber@...sh.net, eric.dumazet@...il.com, netdev@...r.kernel.org, shemminger@...ux-foundation.org, andy@...yhouse.net Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@...il.com wrote: >Le 19/02/2011 14:46, Jiri Pirko a écrit : >>Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@...il.com wrote: >>>Le 19/02/2011 12:28, Jiri Pirko a écrit : >>>>Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@...hat.com wrote: >>>>>Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@...il.com wrote: >>>>>>Le 19/02/2011 09:05, Jiri Pirko a écrit : >>>>>>>This patch converts bonding to use rx_handler. Results in cleaner >>>>>>>__netif_receive_skb() with much less exceptions needed. Also >>>>>>>bond-specific work is moved into bond code. >>>>>>> >>>>>>>Signed-off-by: Jiri Pirko<jpirko@...hat.com> >>>>>>> >>>>>>>v1->v2: >>>>>>> using skb_iif instead of new input_dev to remember original >>>>>>> device >>>>>>>v2->v3: >>>>>>> set orig_dev = skb->dev if skb_iif is set >>>>>>> >>>>>> >>>>>>Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()? >>>>>> >>>>>>Bonding used to be handled with very few overhead, simply replacing >>>>>>skb->dev with skb->dev->master. Time has passed and we eventually >>>>>>added many special processing for bonding into __netif_receive_skb(), >>>>>>but the overhead remained very light. >>>>>> >>>>>>Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead. >>>>>> >>>>>>Can't we, instead, loop inside __netif_receive_skb(), and deliver >>>>>>whatever need to be delivered, to whoever need, inside the loop ? >>>>>> >>>>>>rx_handler = rcu_dereference(skb->dev->rx_handler); >>>>>>while (rx_handler) { >>>>>> /* ... */ >>>>>> orig_dev = skb->dev; >>>>>> skb = rx_handler(skb); >>>>>> /* ... */ >>>>>> rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL; >>>>>>} >>>>>> >>>>>>This would reduce the overhead, while still allowing nesting: vlan on >>>>>>top on bonding, bridge on top on bonding, ... >>>>> >>>>>I see your point. Makes sense to me. But the loop would have to include >>>>>at least processing of ptype_all too. I'm going to cook a follow-up >>>>>patch. >>>>> >>>> >>>>DRAFT (doesn't modify rx_handlers): >>>> >>>>diff --git a/net/core/dev.c b/net/core/dev.c >>>>index 4ebf7fe..e5dba47 100644 >>>>--- a/net/core/dev.c >>>>+++ b/net/core/dev.c >>>>@@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb) >>>> { >>>> struct packet_type *ptype, *pt_prev; >>>> rx_handler_func_t *rx_handler; >>>>+ struct net_device *dev; >>>> struct net_device *orig_dev; >>>> struct net_device *null_or_dev; >>>> int ret = NET_RX_DROP; >>>>@@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb) >>>> if (netpoll_receive_skb(skb)) >>>> return NET_RX_DROP; >>>> >>>>- __this_cpu_inc(softnet_data.processed); >>>>+ skb->skb_iif = skb->dev->ifindex; >>>>+ orig_dev = skb->dev; >>> >>>orig_dev should be set inside the loop, to reflect "previously >>>crossed device", while following the path: >>> >>>eth0 -> bond0 -> br0. >>> >>>First step inside loop: >>> >>>orig_dev = eth0 >>>skb->dev = bond0 (at the end of the loop). >>> >>>Second step inside loop: >>> >>>orig_dev = bond0 >>>skb->dev = br0 (et the end of the loop). >>> >>>This would allow for exact match delivery to bond0 if someone bind there. >>> >>>>+ >>>> skb_reset_network_header(skb); >>>> skb_reset_transport_header(skb); >>>> skb->mac_len = skb->network_header - skb->mac_header; >>>>@@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb) >>>> >>>> rcu_read_lock(); >>>> >>>>- if (!skb->skb_iif) { >>>>- skb->skb_iif = skb->dev->ifindex; >>>>- orig_dev = skb->dev; >>>>- } else { >>>>- orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif); >>>>- } >>> >>>I like the fact that it removes the above part. >>> >>>>+another_round: >>>>+ __this_cpu_inc(softnet_data.processed); >>>>+ dev = skb->dev; >>>> >>>> #ifdef CONFIG_NET_CLS_ACT >>>> if (skb->tc_verd& TC_NCLS) { >>>>@@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb) >>>> #endif >>>> >>>> list_for_each_entry_rcu(ptype,&ptype_all, list) { >>>>- if (!ptype->dev || ptype->dev == skb->dev) { >>>>+ if (!ptype->dev || ptype->dev == dev) { >>>> if (pt_prev) >>>> ret = deliver_skb(skb, pt_prev, orig_dev); >>>> pt_prev = ptype; >>> >>>Inside the loop, we should only do exact match delivery, for >>>&ptype_all and for&ptype_base[ntohs(type)& PTYPE_HASH_MASK]: >>> >>> list_for_each_entry_rcu(ptype,&ptype_all, list) { >>>- if (!ptype->dev || ptype->dev == dev) { >>>+ if (ptype->dev == dev) { >>> if (pt_prev) >>> ret = deliver_skb(skb, pt_prev, orig_dev); >>> pt_prev = ptype; >>> } >>> } >>> >>> >>> list_for_each_entry_rcu(ptype, >>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) { >>> if (ptype->type == type&& >>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) { >>>+ (ptype->dev == skb->dev)) { >>> if (pt_prev) >>> ret = deliver_skb(skb, pt_prev, orig_dev); >>> pt_prev = ptype; >>> } >>> } >>> >>>After leaving the loop, we can do wilcard delivery, if skb is not NULL. >>> >>> list_for_each_entry_rcu(ptype,&ptype_all, list) { >>>- if (!ptype->dev || ptype->dev == dev) { >>>+ if (!ptype->dev) { >>> if (pt_prev) >>> ret = deliver_skb(skb, pt_prev, orig_dev); >>> pt_prev = ptype; >>> } >>> } >>> >>> >>> list_for_each_entry_rcu(ptype, >>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) { >>>- if (ptype->type == type&& >>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) { >>>+ if (ptype->type == type&& !ptype->dev) { >>> if (pt_prev) >>> ret = deliver_skb(skb, pt_prev, orig_dev); >>> pt_prev = ptype; >>> } >>> } >>> >>>This would reduce the number of tests inside the >>>list_for_each_entry_rcu() loops. And because we match only ptype->dev >>>== dev inside the loop and !ptype->dev outside the loop, this should >>>avoid duplicate delivery. >> >>Would you care to put this into patch so I can see the whole picture? >>Thanks. > >Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet. > >Only compile tested !! > >I don't know if every pieces are at the right place. I wonder what to >do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all >and ptype_base processing. > >Anyway, the general idea is there. > > Nicolas. > > net/core/dev.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 files changed, 60 insertions(+), 10 deletions(-) > >diff --git a/net/core/dev.c b/net/core/dev.c >index e5dba47..7e007a9 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb) > rx_handler_func_t *rx_handler; > struct net_device *dev; > struct net_device *orig_dev; >- struct net_device *null_or_dev; > int ret = NET_RX_DROP; > __be16 type; > >@@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb) > if (netpoll_receive_skb(skb)) > return NET_RX_DROP; > >- skb->skb_iif = skb->dev->ifindex; >- orig_dev = skb->dev; >- > skb_reset_network_header(skb); > skb_reset_transport_header(skb); > skb->mac_len = skb->network_header - skb->mac_header; >@@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb) > > another_round: > __this_cpu_inc(softnet_data.processed); >+ skb->skb_iif = skb->dev->ifindex; >+ orig_dev = skb->dev; orig_dev should be set at the end of the loop. Now you are going to have it always the same as dev and skb->dev. > dev = skb->dev; > > #ifdef CONFIG_NET_CLS_ACT >@@ -3152,8 +3150,13 @@ another_round: > } > #endif > >+ /* >+ * Deliver to ptype_all protocol handlers that match current dev. >+ * This happens before rx_handler is given a chance to change skb->dev. >+ */ >+ > list_for_each_entry_rcu(ptype, &ptype_all, list) { >- if (!ptype->dev || ptype->dev == dev) { >+ if (ptype->dev == dev) { > if (pt_prev) > ret = deliver_skb(skb, pt_prev, orig_dev); > pt_prev = ptype; >@@ -3167,6 +3170,31 @@ another_round: > ncls: > #endif > >+ /* >+ * Deliver to ptype_base protocol handlers that match current dev. >+ * This happens before rx_handler is given a chance to change skb->dev. >+ */ >+ >+ type = skb->protocol; >+ list_for_each_entry_rcu(ptype, >+ &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { >+ if (ptype->type == type && ptype->dev == skb->dev) { >+ if (pt_prev) >+ ret = deliver_skb(skb, pt_prev, orig_dev); >+ pt_prev = ptype; >+ } >+ } I'm not sure it is ok to deliver ptype_base here. See comment above ptype_head() (I'm not sure I understand that correctly) >+ >+ /* >+ * Call rx_handler for current device. >+ * If rx_handler return NULL, skip wilcard protocol handler delivery. >+ * Else, if skb->dev changed, restart the whole delivery process, to >+ * allow for device nesting. >+ * >+ * Warning: >+ * rx_handlers must kfree_skb(skb) if they return NULL. Well this is not true. They can return NULL and call netif_rx as they have before. No changes necessary I believe. >+ */ >+ > rx_handler = rcu_dereference(dev->rx_handler); > if (rx_handler) { > if (pt_prev) { >@@ -3176,10 +3204,15 @@ ncls: > skb = rx_handler(skb); > if (!skb) > goto out; >- if (dev != skb->dev) >+ if (skb->dev != dev) > goto another_round; > } > >+ /* >+ * FIXME: The part below should use rx_handler instead of being hard >+ * coded here. I'm not sure it is doable atm. For bridge and bond it should not be a problem, but for macvlan, there is possible to have macvlans and vlans on the same dev. This possibility should persist. /me scratches head on the idea to have multiple rx_handlers although it was his original idea.... >+ */ >+ > if (vlan_tx_tag_present(skb)) { > if (pt_prev) { > ret = deliver_skb(skb, pt_prev, orig_dev); >@@ -3192,16 +3225,33 @@ ncls: > goto out; > } > >+ /* >+ * FIXME: Can't this be moved into the rx_handler for bonding, >+ * or into a futur rx_handler for vlan? This hook is something I do not like at all :/ But anyway if should be in vlan part I think. >+ */ >+ > vlan_on_bond_hook(skb); > >- /* deliver only exact match when indicated */ >- null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL; >+ /* >+ * Deliver to wildcard ptype_all protocol handlers. >+ */ >+ >+ list_for_each_entry_rcu(ptype, &ptype_all, list) { >+ if (!ptype->dev) { >+ if (pt_prev) >+ ret = deliver_skb(skb, pt_prev, orig_dev); >+ pt_prev = ptype; >+ } >+ } >+ >+ /* >+ * Deliver to wildcard ptype_all protocol handlers. >+ */ > > type = skb->protocol; > list_for_each_entry_rcu(ptype, > &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { >- if (ptype->type == type && >- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) { >+ if (ptype->type == type && !ptype->dev) { > if (pt_prev) > ret = deliver_skb(skb, pt_prev, orig_dev); > pt_prev = ptype; >-- >1.7.2.3 > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists