lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150617063002.GA2090@nanopsycho.orion>
Date:	Wed, 17 Jun 2015 08:30:02 +0200
From:	Jiri Pirko <jiri@...nulli.us>
To:	Scott Feldman <sfeldma@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>,
	"simon.horman@...ronome.com" <simon.horman@...ronome.com>,
	Roopa Prabhu <roopa@...ulusnetworks.com>,
	"Arad, Ronen" <ronen.arad@...el.com>,
	"Fastabend, John R" <john.r.fastabend@...el.com>,
	"andrew@...n.ch" <andrew@...n.ch>,
	Florian Fainelli <f.fainelli@...il.com>,
	Guenter Roeck <linux@...ck-us.net>,
	davidch <davidch@...adcom.com>,
	"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet
 forwarding

Wed, Jun 17, 2015 at 01:53:56AM CEST, sfeldma@...il.com wrote:
>On Tue, Jun 16, 2015 at 2:11 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@...il.com wrote:
>>>On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@...emloft.net wrote:
>>>>>From: sfeldma@...il.com
>>>>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>>>>
>>>>>> The switchdev port driver must do two things:
>>>>>>
>>>>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>>>>    switch device (and optionally port).  This is a one-time operation done
>>>>>>    when port's netdev is setup.
>>>>>>
>>>>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>>>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>>>>>    which were already forwarded by the device.  If the device does not
>>>>>>    support such indication, all skbs can be marked, even if they're
>>>>>>    local dst.
>>>>>>
>>>>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>>>>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>>>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>>>>> skb->mark before the bridge gets it, so that will not work.
>>>>>>
>>>>>> In general, this fwd_mark can be used for any case where a packet is
>>>>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>>>>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>>>>>> but I haven't explored the details.
>>>>>
>>>>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>>>>open minded. :-)
>>>>>
>>>>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>>>>let's just do it right and use something like lib/idr.c to generate
>>>>>truly unique indices at probe time for all devices using this
>>>>>facility.  I like that better than having them be unique by a happy
>>>>>accident.
>>>>
>>>> We already have per-device uniqueue key. dev->ifindex.
>>>> That should be good for fwd_mark purposes I believe.
>>>
>>>It would be great if we could use dev->index, but fwd_mark is really
>>>to mark device ports that belong to a group.  In the case of a bridge,
>>>the device ports in the bridge should all have the same mark.  And
>>>another device's ports in the same bridge would have a different mark
>>>(so we can't use the bridge's dev->ifindex).  On ingress, the skb is
>>>marked with the ingress port's mark.  If the skb is to be forwarded
>>>out an egress port, the skb mark is compared with egress port's mark.
>>>If marks compare, then the device has already forwarded the pkt so the
>>>kernel can consume_skb to avoid duplicate pkts on the wire.
>>>
>>>So what we need is a unique mark for device ports within a fwding
>>>group, such as a bridge.
>>
>> Yep, have a group of netdevs, pick one of them and use it's ifindex for
>> the whole group.
>
>That will not work because ports from two switches in the same bridge
>need different marks...that's how the bridge knows which ports to fwd
>and which ones to skip.
>
>Example:
>
>br0
>   sw1p1 (mark=3)
>   sw1p2 (mark=3)
>   sw2p2 (mark=7)
>   sw2p2 (mark=7)
>
>Two switches, sw1 and sw2, in bridge br0.  Let's say sw1 receives an
>unknown unicast pkt. It'll flood the pkt to its other switch ports
>(sw1p2, in this case) and send a copy to the CPU (the bridge), with
>skb->mark=3.  The bridge will flood pkt to sw1p2, sw2p1, and sw2p2,
>but our little check in dev.c will drop the pkt to sw1p2 just before
>egress onto wire.  Pkt goes out on sw2p1 and sw2p2.  This is why we
>can't use just the br0 ifindex to generate the mark.  We need
>something unique about the switch port and the bridge ifindex to give
>us a mark for a port.


Scott, again, I'm not talking about br0 ifindex! I'm talking about
ifindexes of the port netdevs.

br0
  sw1p1  ifindex=1
  sw1p2  ifindex=2
  sw2p1  ifindex=3
  sw2p2  ifindex=4


now we have 2 groups, one in each switch. So for sw1 group (sw1p1 and
sw1p2) we select mark one of the ifindexes, so either "1" or "2"
For second group, from sw2 we select "3" or "4". So for example:
br0
  sw1p1  mark=1
  sw1p2  mark=1
  sw2p1  mark=4
  sw2p2  mark=4

This is generic, usable for every group.


>
>I'm going to send v2 which uses switch port ppid + some group ifindex
>to generate port mark.  Group ifindex could be bond ifindex or bridge
>ifindex or even zero if ports are L3 router ports and we want to prune
>any L3 forwarding by the kernel.  There is some flexibility here,
>depending on what we're trying to do.  We have the switch port ppid,
>so we might as well use it.
>
>-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ