[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17034_1437555506_55AF5B32_17034_5736_1_55AF5B30.8070208@orange.com>
Date: Wed, 22 Jul 2015 10:58:24 +0200
From: <thomas.morin@...nge.com>
To: Thomas Graf <tgraf@...g.ch>, <roopa@...ulusnetworks.com>,
<rshearma@...cade.com>, <ebiederm@...ssion.com>,
<hannes@...essinduktion.org>, <pshelar@...ira.com>,
<jesse@...ira.com>, <davem@...emloft.net>, <daniel@...earbox.net>,
<tom@...bertland.com>, <edumazet@...gle.com>, <jiri@...nulli.us>,
<marcelo.leitner@...il.com>, <stephen@...workplumber.org>,
<jpettit@...ira.com>, <kaber@...sh.net>,
<simon.horman@...ronome.com>, <joestringer@...ira.com>,
<ja@....bg>, <ast@...mgrid.com>, <weichunc@...mgrid.com>
CC: <dev@...nvswitch.org>, <netdev@...r.kernel.org>
Subject: Re: [ovs-dev] [PATCH net-next 00/22 v2] Lightweight & flow based
encapsulation
Hi Thomas,
This looks promising.
One question: will this approach allow MPLS-in-GRE and MPLS-in-UDP ?
-Thomas
2015-07-21, Thomas Graf:
> This series combines the work previously posted by Roopa, Robert and
> myself. It's according to what we discussed at NFWS. The motivation
> of this series is to:
>
> * Consolidate code between OVS and the rest of the kernel and get
> rid of OVS vports and instead represent them as pure net_devices.
> * Introduce a lightweight tunneling mechanism which enables flow
> based encapsulation to improve scalability on both RX and TX.
> * Do the above in an encapsulation unspecific way so that the
> encapsulation type is eventually abstracted away from the user.
> * Use the same forwarding decision for both native forwarding and
> encapsulation thus allowing to switch between native IPv6 and
> UDP encapsulation based on endpoint without requiring additional
> logic
>
> The fundamental changes introduces in this series are:
> * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
> instructions. Depending on the specified type, the instructions
> apply to UDP encapsulations, MPLS and possible other in the future.
> * Depending on the encapsulation type, the output function of the
> dst is directly overwritten or the dst merely attaches metadata and
> relies on a subsequent net_device to apply it to the packet. The
> latter is typically used if an inner and outer IP header exist which
> require two subsequent routing lookups to be performed.
> * A new metadata_dst structure which can be attached to skbs to
> carry metadata in between subsystems. This new metadata transport
> is used to provide a single interface for VXLAN, routing and OVS
> to communicate through metadata.
>
> The OVS interfaces remain as-is but will transparently create a real
> VXLAN net_device in the background. iproute2 is extended with a new
> use cases:
>
> VXLAN:
> ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0
>
> MPLS:
> ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
>
> Performance implications:
> The additional memory allocation in the receive path should have
> performance implications although it is not observable in standard
> throughput tests if GRO is properly done. The correct net_device
> model outweights the additional cost of the allocation. Furthermore,
> this implication can be relaxed by reintroducing a direct unqueued
> path from a software device to a consumer like bridge or OVS if
> needed.
>
> $ netperf -t TCP_STREAM -H 15.1.1.201
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 9118.17
>
> Changes since v1:
> * Properly initialize tun_id as reported by Julian
> * Drop dupliate netif_keep_dst() as reported by Alexei
>
> Roopa Prabhu (9):
> rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
> lwtunnel: infrastructure for handling light weight tunnels like mpls
> ipv4: support for fib route lwtunnel encap attributes
> ipv6: support for fib route lwtunnel encap attributes
> lwtunnel: support dst output redirect function
> ipv4: redirect dst output to lwtunnel output
> ipv6: rt6_info output redirect to tunnel output
> mpls: export mpls functions for use by mpls iptunnels
> mpls: ip tunnel support
>
> Thomas Graf (13):
> ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
> icmp: Don't leak original dst into ip_route_input()
> dst: Metadata destinations
> arp: Inherit metadata dst when creating ARP requests
> vxlan: Flow based tunneling
> route: Extend flow representation with tunnel key
> route: Per route IP tunnel metadata via lightweight tunnel
> fib: Add fib rule match on tunnel id
> vxlan: Factor out device configuration
> openvswitch: Make tunnel set action attach a metadata dst
> openvswitch: Move dev pointer into vport itself
> openvswitch: Abstract vport name through ovs_vport_name()
> openvswitch: Use regular VXLAN net_device device
>
> drivers/net/vxlan.c | 672 +++++++++++++++++++++--------------
> include/linux/lwtunnel.h | 6 +
> include/linux/mpls_iptunnel.h | 6 +
> include/linux/skbuff.h | 1 +
> include/net/dst.h | 6 +-
> include/net/dst_metadata.h | 55 +++
> include/net/fib_rules.h | 1 +
> include/net/flow.h | 8 +
> include/net/ip6_fib.h | 3 +
> include/net/ip_fib.h | 5 +-
> include/net/ip_tunnels.h | 95 ++++-
> include/net/lwtunnel.h | 144 ++++++++
> include/net/mpls_iptunnel.h | 29 ++
> include/net/route.h | 1 +
> include/net/rtnetlink.h | 1 +
> include/net/vxlan.h | 85 ++++-
> include/uapi/linux/fib_rules.h | 2 +-
> include/uapi/linux/if_link.h | 1 +
> include/uapi/linux/lwtunnel.h | 16 +
> include/uapi/linux/mpls_iptunnel.h | 28 ++
> include/uapi/linux/openvswitch.h | 2 +-
> include/uapi/linux/rtnetlink.h | 17 +
> net/Kconfig | 7 +
> net/core/Makefile | 1 +
> net/core/dev.c | 2 +-
> net/core/dst.c | 84 ++++-
> net/core/fib_rules.c | 24 +-
> net/core/lwtunnel.c | 235 ++++++++++++
> net/core/rtnetlink.c | 26 +-
> net/ipv4/arp.c | 65 ++--
> net/ipv4/fib_frontend.c | 10 +
> net/ipv4/fib_semantics.c | 96 ++++-
> net/ipv4/icmp.c | 1 +
> net/ipv4/ip_input.c | 3 +-
> net/ipv4/ip_tunnel_core.c | 130 +++++++
> net/ipv4/route.c | 28 +-
> net/ipv6/ip6_fib.c | 2 +
> net/ipv6/route.c | 34 +-
> net/mpls/Kconfig | 8 +-
> net/mpls/Makefile | 1 +
> net/mpls/af_mpls.c | 11 +-
> net/mpls/internal.h | 9 +-
> net/mpls/mpls_iptunnel.c | 233 ++++++++++++
> net/openvswitch/Kconfig | 12 -
> net/openvswitch/Makefile | 1 -
> net/openvswitch/actions.c | 12 +-
> net/openvswitch/datapath.c | 19 +-
> net/openvswitch/datapath.h | 5 +-
> net/openvswitch/dp_notify.c | 5 +-
> net/openvswitch/flow.c | 4 +-
> net/openvswitch/flow.h | 79 +---
> net/openvswitch/flow_netlink.c | 84 ++++-
> net/openvswitch/flow_netlink.h | 3 +-
> net/openvswitch/flow_table.c | 4 +-
> net/openvswitch/vport-geneve.c | 17 +-
> net/openvswitch/vport-gre.c | 16 +-
> net/openvswitch/vport-internal_dev.c | 38 +-
> net/openvswitch/vport-netdev.c | 289 ++++++++++++---
> net/openvswitch/vport-netdev.h | 13 -
> net/openvswitch/vport-vxlan.c | 322 -----------------
> net/openvswitch/vport-vxlan.h | 11 -
> net/openvswitch/vport.c | 34 +-
> net/openvswitch/vport.h | 21 +-
> 63 files changed, 2231 insertions(+), 952 deletions(-)
> create mode 100644 include/linux/lwtunnel.h
> create mode 100644 include/linux/mpls_iptunnel.h
> create mode 100644 include/net/dst_metadata.h
> create mode 100644 include/net/lwtunnel.h
> create mode 100644 include/net/mpls_iptunnel.h
> create mode 100644 include/uapi/linux/lwtunnel.h
> create mode 100644 include/uapi/linux/mpls_iptunnel.h
> create mode 100644 net/core/lwtunnel.c
> create mode 100644 net/mpls/mpls_iptunnel.c
> delete mode 100644 net/openvswitch/vport-vxlan.c
> delete mode 100644 net/openvswitch/vport-vxlan.h
>
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists