[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BE30C67.4030104@intel.com>
Date: Thu, 06 May 2010 11:37:27 -0700
From: John Fastabend <john.r.fastabend@...el.com>
To: Jay Vosburgh <fubar@...ibm.com>
CC: "bonding-devel@...ts.sourceforge.net"
<bonding-devel@...ts.sourceforge.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Leech, Christopher" <christopher.leech@...el.com>,
"andy@...yhouse.net" <andy@...yhouse.net>,
"kaber@...sh.net" <kaber@...sh.net>
Subject: Re: [PATCH] net: deliver skbs on inactive slaves to exact matches
Jay Vosburgh wrote:
> John Fastabend <john.r.fastabend@...el.com> wrote:
>
>> Currently, the accelerated receive path for VLAN's will
>> drop packets if the real device is an inactive slave and
>> is not one of the special pkts tested for in
>> skb_bond_should_drop(). This behavior is different then
>> the non-accelerated path and for pkts over a bonded vlan.
>>
>> For example,
>>
>> vlanx -> bond0 -> ethx
>>
>> will be dropped in the vlan path and not delivered to any
>> packet handlers. However,
>>
>> bond0 -> vlanx -> ethx
>>
>> will be delivered to handlers that match the exact dev,
>> because the VLAN path checks the real_dev which is not a
>> slave and netif_recv_skb() doesn't drop frames but only
>> delivers them to exact matches.
>>
>> This patch adds a pkt_type PACKET_DROP which is now used
>> to identify skbs that would previously been dropped and
>> allows the skb to continue to skb_netif_recv(). Here we
>> add logic to check for PACKET_DROP and if so only deliver
>> to handlers that match exactly. IMHO this is more
>> consistent and gives pkt handlers a way to identify skbs
>> that come from inactive slaves.
>
> I was looking at this some yesterday and this morning, trying to
> figure out if a packet going up the IP protocol stack with pkt_type ==
> PACKET_DROP would break anything. For example:
>
> net/ipv4/ip_options.c:
> int ip_options_rcv_srr(struct sk_buff *skb)
> {
> [...]
> if (!opt->srr)
> return 0;
>
> if (skb->pkt_type != PACKET_HOST)
> return -EINVAL;
>
> or:
>
> net/ipv4/tcp_ipv4.c:
> int tcp_v4_rcv(struct sk_buff *skb)
> {
> const struct iphdr *iph;
> struct tcphdr *th;
> struct sock *sk;
> int ret;
> struct net *net = dev_net(skb->dev);
>
> if (skb->pkt_type != PACKET_HOST)
> goto discard_it;
>
> Is pkt_type == PACKET_DROP instead going to break things for
> these, or the other similar cases?
>
> I think what you're looking for is really the usual PACKET_HOST,
> PACKET_OTHERHOST, et al, upper protocol behavior, with the exception
> that at the __netif_receive_skb level, wildcard matches in the ptypes
> are not done. Setting the pkt_type to PACKET_DROP loses the _HOST,
> _OTHERHOST, etc, information, but sends the packet up the stack anyway,
> and I'm not sure that's really the right thing to do.
Because we wouldn't be doing wildcard matches the skb shouldn't be
passed to the IP protocol stack. Really what we want is the ip_rcv() to
drop the skb in this case,
net/ipv4/ip_input.c
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct
packet_type *pt, struct net_device *orig_dev)
{
[...]
if (skb->pkt_type == PACKET_OTHERHOST ||
skb->pkt_type == PACKET_DROP)
goto drop;
We do lose some information about the packet _HOST, _OTHERHOST, etc, but
we also gain something namely that the packet was received on an
inactive slave. Currently, we pass these frames up the stack (for non
wildcard matches) without any indication that they were received on an
inactive slave.
What we need is a way for the VLAN path to flag skbs so that wildcard
matches in the ptyes are not done. Also I think it may be valuable for
the packet handler to be able to determine if the skb is coming from an
inactive slave. I think using the pkt_type field might be OK any ideas
for a better field?
Thanks,
John
>
> -J
>
>> This allows a third case to function which is important for
>> doing multipath with FCoE traffic while LAN traffic bonded,
>>
>> bond0 -> ethx
>> |
>> vlanx -> --
>>
>> Here the vlan is not in bond0 but the FCoE handler can still
>> receive the skb. Previously these skbs were dropped.
>>
>> I have tested the following 4 configurations in failover modes
>> and load balancing modes and have not seen any duplicate packets
>> or unexpected bahavior.
>>
>> # bond0 -> ethx
>>
>> # vlanx -> bond0 -> ethx
>>
>> # bond0 -> vlanx -> ethx
>>
>> # bond0 -> ethx
>> |
>> vlanx -> --
>>
>> Also this removes the PACKET_FASTROUTE define which was not being
>> used.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
>> ---
>>
>> include/linux/if_packet.h | 2 +-
>> net/8021q/vlan_core.c | 4 ++--
>> net/core/dev.c | 25 ++++++++++++++++++-------
>> 3 files changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
>> index 6ac23ef..9d079fa 100644
>> --- a/include/linux/if_packet.h
>> +++ b/include/linux/if_packet.h
>> @@ -28,7 +28,7 @@ struct sockaddr_ll {
>> #define PACKET_OUTGOING 4 /* Outgoing of any type */
>> /* These ones are invisible by user level */
>> #define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */
>> -#define PACKET_FASTROUTE 6 /* Fastrouted frame */
>> +#define PACKET_DROP 6 /* Drop packet */
>>
>> /* Packet socket options */
>>
>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>> index c584a0a..4510e08 100644
>> --- a/net/8021q/vlan_core.c
>> +++ b/net/8021q/vlan_core.c
>> @@ -12,7 +12,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
>> return NET_RX_DROP;
>>
>> if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>> - goto drop;
>> + skb->pkt_type = PACKET_DROP;
>>
>> skb->skb_iif = skb->dev->ifindex;
>> __vlan_hwaccel_put_tag(skb, vlan_tci);
>> @@ -84,7 +84,7 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
>> struct sk_buff *p;
>>
>> if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>> - goto drop;
>> + skb->pkt_type = PACKET_DROP;
>>
>> skb->skb_iif = skb->dev->ifindex;
>> __vlan_hwaccel_put_tag(skb, vlan_tci);
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 36d53be..cefac4f 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2776,7 +2776,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> struct net_device *orig_dev;
>> struct net_device *master;
>> struct net_device *null_or_orig;
>> - struct net_device *null_or_bond;
>> + struct net_device *dev_or_bond;
>> int ret = NET_RX_DROP;
>> __be16 type;
>>
>> @@ -2793,13 +2793,24 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> if (!skb->skb_iif)
>> skb->skb_iif = skb->dev->ifindex;
>>
>> + /*
>> + * bonding note: skbs received on inactive slaves should only
>> + * be delivered to pkt handlers that are exact matches. Also
>> + * the pkt_type field will be marked PACKET_DROP. If packet
>> + * handlers are sensitive to duplicate packets these skbs will
>> + * need to be dropped at the handler. The vlan accel path may
>> + * have already set PACKET_DROP.
>> + */
>> null_or_orig = NULL;
>> orig_dev = skb->dev;
>> master = ACCESS_ONCE(orig_dev->master);
>> - if (master) {
>> - if (skb_bond_should_drop(skb, master))
>> + if (skb->pkt_type == PACKET_DROP)
>> + null_or_orig = orig_dev;
>> + else if (master) {
>> + if (skb_bond_should_drop(skb, master)) {
>> + skb->pkt_type = PACKET_DROP;
>> null_or_orig = orig_dev; /* deliver only exact match */
>> - else
>> + } else
>> skb->dev = master;
>> }
>>
>> @@ -2849,10 +2860,10 @@ ncls:
>> * device that may have registered for a specific ptype. The
>> * handler may have to adjust skb->dev and orig_dev.
>> */
>> - null_or_bond = NULL;
>> + dev_or_bond = skb->dev;
>> if ((skb->dev->priv_flags & IFF_802_1Q_VLAN) &&
>> (vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING)) {
>> - null_or_bond = vlan_dev_real_dev(skb->dev);
>> + dev_or_bond = vlan_dev_real_dev(skb->dev);
>> }
>>
>> type = skb->protocol;
>> @@ -2860,7 +2871,7 @@ ncls:
>> &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
>> if (ptype->type == type && (ptype->dev == null_or_orig ||
>> ptype->dev == skb->dev || ptype->dev == orig_dev ||
>> - ptype->dev == null_or_bond)) {
>> + ptype->dev == dev_or_bond)) {
>> if (pt_prev)
>> ret = deliver_skb(skb, pt_prev, orig_dev);
>> pt_prev = ptype;
>>
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists