[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150617063002.GA2090@nanopsycho.orion>
Date: Wed, 17 Jun 2015 08:30:02 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Scott Feldman <sfeldma@...il.com>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
"simon.horman@...ronome.com" <simon.horman@...ronome.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
"Arad, Ronen" <ronen.arad@...el.com>,
"Fastabend, John R" <john.r.fastabend@...el.com>,
"andrew@...n.ch" <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Guenter Roeck <linux@...ck-us.net>,
davidch <davidch@...adcom.com>,
"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet
forwarding
Wed, Jun 17, 2015 at 01:53:56AM CEST, sfeldma@...il.com wrote:
>On Tue, Jun 16, 2015 at 2:11 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@...il.com wrote:
>>>On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@...emloft.net wrote:
>>>>>From: sfeldma@...il.com
>>>>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>>>>
>>>>>> The switchdev port driver must do two things:
>>>>>>
>>>>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>>>> switch device (and optionally port). This is a one-time operation done
>>>>>> when port's netdev is setup.
>>>>>>
>>>>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>>>> fwd_mark. If the device supports it, it's useful to only mark skbs
>>>>>> which were already forwarded by the device. If the device does not
>>>>>> support such indication, all skbs can be marked, even if they're
>>>>>> local dst.
>>>>>>
>>>>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>>>>> hold the fwd_mark. I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>>>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>>>>> skb->mark before the bridge gets it, so that will not work.
>>>>>>
>>>>>> In general, this fwd_mark can be used for any case where a packet is
>>>>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>>>>> re-forwarding the packet. sFlow is another use-case that comes to mind,
>>>>>> but I haven't explored the details.
>>>>>
>>>>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>>>>open minded. :-)
>>>>>
>>>>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>>>>let's just do it right and use something like lib/idr.c to generate
>>>>>truly unique indices at probe time for all devices using this
>>>>>facility. I like that better than having them be unique by a happy
>>>>>accident.
>>>>
>>>> We already have per-device uniqueue key. dev->ifindex.
>>>> That should be good for fwd_mark purposes I believe.
>>>
>>>It would be great if we could use dev->index, but fwd_mark is really
>>>to mark device ports that belong to a group. In the case of a bridge,
>>>the device ports in the bridge should all have the same mark. And
>>>another device's ports in the same bridge would have a different mark
>>>(so we can't use the bridge's dev->ifindex). On ingress, the skb is
>>>marked with the ingress port's mark. If the skb is to be forwarded
>>>out an egress port, the skb mark is compared with egress port's mark.
>>>If marks compare, then the device has already forwarded the pkt so the
>>>kernel can consume_skb to avoid duplicate pkts on the wire.
>>>
>>>So what we need is a unique mark for device ports within a fwding
>>>group, such as a bridge.
>>
>> Yep, have a group of netdevs, pick one of them and use it's ifindex for
>> the whole group.
>
>That will not work because ports from two switches in the same bridge
>need different marks...that's how the bridge knows which ports to fwd
>and which ones to skip.
>
>Example:
>
>br0
> sw1p1 (mark=3)
> sw1p2 (mark=3)
> sw2p2 (mark=7)
> sw2p2 (mark=7)
>
>Two switches, sw1 and sw2, in bridge br0. Let's say sw1 receives an
>unknown unicast pkt. It'll flood the pkt to its other switch ports
>(sw1p2, in this case) and send a copy to the CPU (the bridge), with
>skb->mark=3. The bridge will flood pkt to sw1p2, sw2p1, and sw2p2,
>but our little check in dev.c will drop the pkt to sw1p2 just before
>egress onto wire. Pkt goes out on sw2p1 and sw2p2. This is why we
>can't use just the br0 ifindex to generate the mark. We need
>something unique about the switch port and the bridge ifindex to give
>us a mark for a port.
Scott, again, I'm not talking about br0 ifindex! I'm talking about
ifindexes of the port netdevs.
br0
sw1p1 ifindex=1
sw1p2 ifindex=2
sw2p1 ifindex=3
sw2p2 ifindex=4
now we have 2 groups, one in each switch. So for sw1 group (sw1p1 and
sw1p2) we select mark one of the ifindexes, so either "1" or "2"
For second group, from sw2 we select "3" or "4". So for example:
br0
sw1p1 mark=1
sw1p2 mark=1
sw2p1 mark=4
sw2p2 mark=4
This is generic, usable for every group.
>
>I'm going to send v2 which uses switch port ppid + some group ifindex
>to generate port mark. Group ifindex could be bond ifindex or bridge
>ifindex or even zero if ports are L3 router ports and we want to prune
>any L3 forwarding by the kernel. There is some flexibility here,
>depending on what we're trying to do. We have the switch port ppid,
>so we might as well use it.
>
>-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists