[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1437468140.git.tgraf@suug.ch>
Date: Tue, 21 Jul 2015 10:43:44 +0200
From: Thomas Graf <tgraf@...g.ch>
To: roopa@...ulusnetworks.com, rshearma@...cade.com,
ebiederm@...ssion.com, hannes@...essinduktion.org,
pshelar@...ira.com, jesse@...ira.com, davem@...emloft.net,
daniel@...earbox.net, tom@...bertland.com, edumazet@...gle.com,
jiri@...nulli.us, marcelo.leitner@...il.com,
stephen@...workplumber.org, jpettit@...ira.com, kaber@...sh.net,
simon.horman@...ronome.com, joestringer@...ira.com, ja@....bg,
ast@...mgrid.com, weichunc@...mgrid.com
Cc: netdev@...r.kernel.org, dev@...nvswitch.org
Subject: [PATCH net-next 00/22 v2] Lightweight & flow based encapsulation
This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:
* Consolidate code between OVS and the rest of the kernel and get
rid of OVS vports and instead represent them as pure net_devices.
* Introduce a lightweight tunneling mechanism which enables flow
based encapsulation to improve scalability on both RX and TX.
* Do the above in an encapsulation unspecific way so that the
encapsulation type is eventually abstracted away from the user.
* Use the same forwarding decision for both native forwarding and
encapsulation thus allowing to switch between native IPv6 and
UDP encapsulation based on endpoint without requiring additional
logic
The fundamental changes introduces in this series are:
* A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
instructions. Depending on the specified type, the instructions
apply to UDP encapsulations, MPLS and possible other in the future.
* Depending on the encapsulation type, the output function of the
dst is directly overwritten or the dst merely attaches metadata and
relies on a subsequent net_device to apply it to the packet. The
latter is typically used if an inner and outer IP header exist which
require two subsequent routing lookups to be performed.
* A new metadata_dst structure which can be attached to skbs to
carry metadata in between subsystems. This new metadata transport
is used to provide a single interface for VXLAN, routing and OVS
to communicate through metadata.
The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:
VXLAN:
ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0
MPLS:
ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
Performance implications:
The additional memory allocation in the receive path should have
performance implications although it is not observable in standard
throughput tests if GRO is properly done. The correct net_device
model outweights the additional cost of the allocation. Furthermore,
this implication can be relaxed by reintroducing a direct unqueued
path from a software device to a consumer like bridge or OVS if
needed.
$ netperf -t TCP_STREAM -H 15.1.1.201
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 9118.17
Changes since v1:
* Properly initialize tun_id as reported by Julian
* Drop dupliate netif_keep_dst() as reported by Alexei
Roopa Prabhu (9):
rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
lwtunnel: infrastructure for handling light weight tunnels like mpls
ipv4: support for fib route lwtunnel encap attributes
ipv6: support for fib route lwtunnel encap attributes
lwtunnel: support dst output redirect function
ipv4: redirect dst output to lwtunnel output
ipv6: rt6_info output redirect to tunnel output
mpls: export mpls functions for use by mpls iptunnels
mpls: ip tunnel support
Thomas Graf (13):
ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
icmp: Don't leak original dst into ip_route_input()
dst: Metadata destinations
arp: Inherit metadata dst when creating ARP requests
vxlan: Flow based tunneling
route: Extend flow representation with tunnel key
route: Per route IP tunnel metadata via lightweight tunnel
fib: Add fib rule match on tunnel id
vxlan: Factor out device configuration
openvswitch: Make tunnel set action attach a metadata dst
openvswitch: Move dev pointer into vport itself
openvswitch: Abstract vport name through ovs_vport_name()
openvswitch: Use regular VXLAN net_device device
drivers/net/vxlan.c | 672 +++++++++++++++++++++--------------
include/linux/lwtunnel.h | 6 +
include/linux/mpls_iptunnel.h | 6 +
include/linux/skbuff.h | 1 +
include/net/dst.h | 6 +-
include/net/dst_metadata.h | 55 +++
include/net/fib_rules.h | 1 +
include/net/flow.h | 8 +
include/net/ip6_fib.h | 3 +
include/net/ip_fib.h | 5 +-
include/net/ip_tunnels.h | 95 ++++-
include/net/lwtunnel.h | 144 ++++++++
include/net/mpls_iptunnel.h | 29 ++
include/net/route.h | 1 +
include/net/rtnetlink.h | 1 +
include/net/vxlan.h | 85 ++++-
include/uapi/linux/fib_rules.h | 2 +-
include/uapi/linux/if_link.h | 1 +
include/uapi/linux/lwtunnel.h | 16 +
include/uapi/linux/mpls_iptunnel.h | 28 ++
include/uapi/linux/openvswitch.h | 2 +-
include/uapi/linux/rtnetlink.h | 17 +
net/Kconfig | 7 +
net/core/Makefile | 1 +
net/core/dev.c | 2 +-
net/core/dst.c | 84 ++++-
net/core/fib_rules.c | 24 +-
net/core/lwtunnel.c | 235 ++++++++++++
net/core/rtnetlink.c | 26 +-
net/ipv4/arp.c | 65 ++--
net/ipv4/fib_frontend.c | 10 +
net/ipv4/fib_semantics.c | 96 ++++-
net/ipv4/icmp.c | 1 +
net/ipv4/ip_input.c | 3 +-
net/ipv4/ip_tunnel_core.c | 130 +++++++
net/ipv4/route.c | 28 +-
net/ipv6/ip6_fib.c | 2 +
net/ipv6/route.c | 34 +-
net/mpls/Kconfig | 8 +-
net/mpls/Makefile | 1 +
net/mpls/af_mpls.c | 11 +-
net/mpls/internal.h | 9 +-
net/mpls/mpls_iptunnel.c | 233 ++++++++++++
net/openvswitch/Kconfig | 12 -
net/openvswitch/Makefile | 1 -
net/openvswitch/actions.c | 12 +-
net/openvswitch/datapath.c | 19 +-
net/openvswitch/datapath.h | 5 +-
net/openvswitch/dp_notify.c | 5 +-
net/openvswitch/flow.c | 4 +-
net/openvswitch/flow.h | 79 +---
net/openvswitch/flow_netlink.c | 84 ++++-
net/openvswitch/flow_netlink.h | 3 +-
net/openvswitch/flow_table.c | 4 +-
net/openvswitch/vport-geneve.c | 17 +-
net/openvswitch/vport-gre.c | 16 +-
net/openvswitch/vport-internal_dev.c | 38 +-
net/openvswitch/vport-netdev.c | 289 ++++++++++++---
net/openvswitch/vport-netdev.h | 13 -
net/openvswitch/vport-vxlan.c | 322 -----------------
net/openvswitch/vport-vxlan.h | 11 -
net/openvswitch/vport.c | 34 +-
net/openvswitch/vport.h | 21 +-
63 files changed, 2231 insertions(+), 952 deletions(-)
create mode 100644 include/linux/lwtunnel.h
create mode 100644 include/linux/mpls_iptunnel.h
create mode 100644 include/net/dst_metadata.h
create mode 100644 include/net/lwtunnel.h
create mode 100644 include/net/mpls_iptunnel.h
create mode 100644 include/uapi/linux/lwtunnel.h
create mode 100644 include/uapi/linux/mpls_iptunnel.h
create mode 100644 net/core/lwtunnel.c
create mode 100644 net/mpls/mpls_iptunnel.c
delete mode 100644 net/openvswitch/vport-vxlan.c
delete mode 100644 net/openvswitch/vport-vxlan.h
--
2.4.3
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists