[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1374289623-17056-7-git-send-email-horms@verge.net.au>
Date: Sat, 20 Jul 2013 12:07:03 +0900
From: Simon Horman <horms@...ge.net.au>
To: dev@...nvswitch.org, netdev@...r.kernel.org
Cc: Ravi K <rkerur@...il.com>, Isaku Yamahata <yamahata@...inux.co.jp>,
Jesse Gross <jesse@...ira.com>,
Pravin B Shelar <pshelar@...ira.com>,
jarno.rajahalme@....com, Joe Stringer <joe@...d.net.nz>
Subject: [PATCH v2.35 6/6] datapath: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@...il.com>
Cc: Leo Alterman <lalterman@...ira.com>
Cc: Isaku Yamahata <yamahata@...inux.co.jp>
Cc: Joe Stringer <joe@...d.net.nz>
Signed-off-by: Simon Horman <horms@...ge.net.au>
---
v2.35
* Rebase
* Move MPLS constants to mpls.h
* Push MPLS tags after ethernet, before VLAN tags
- This is consistent with the OpenFlow 1.3 specification
- Compatibility with OpenFlow 1.2 and earlier versions
may be provided by ovs-vswitchd.
* Correct GSO behaviour in the presence of MPLS but absence of VLANs
v2.34
* Rebase for megaflow changes
v2.33
* Ensure that inner_protocol is always set to to the current
skb->protocol value in ovs_execute_actions(). This ensures
it is set to the correct value in the absence of a push_mpls action.
Also remove setting of inner_protocol in push_mpls() as
it duplicates the code now in ovs_execute_actions().
* Call __skb_gso_segment() instead of skb_gso_segment() from
rpl___skb_gso_segment() in the case that HAVE___SKB_GSO_SEGMENT is set.
This was a typo.
v2.32
* As suggested by Jesse Gross
- Use int instead of size_t in validate_and_copy_actions__().
- Fix crazy edit mess in pop_mpls() action comment
- Move eth_p_mpls() into mpls.h
- Refactor skb_gso_segment MPLS handling into rpl_skb_gso_segment
Address Jesse's comments regarding this code:
"Can we push this completely into the skb_gso_segment() compatibility
code? It's both nicer and may make the interactions with the vlan code
less confusing."
- Move GSO compatibility code into linux/compat/gso.*
- Set skb->protocol on mpls_push and mpls_pop in the presence
of an offloaded VLAN.
v2.31
* As suggested by Jesse Gross
- There is no need to make mac_header_end inline as it is not in a header file
- Remove dubious if (*skb_ethertype == ethertype) optimisation from
set_ethertype
- Only set skb->protocol in push_mpls() or pop_mpls() for non-VLAN packets
- Use MAX_ETH_TYPES instead of SAMPLE_ACTION_DEPTH for array size
of types in struct eth_types. This corrects a typo/thinko.
- Correct eth type tracking logic such that start isn't advanced
when entering a sample action, ensuring that all possibly types
are checked when verifying nested actions.
* Define HAVE_INNER_PROTOCOL based on kernel version.
inner_protocol has been merged into net-next and should appear in
v3.11 so there is no longer a need for a acinclude.m4 test to check for it.
* Add MPLS GSO compatibility code.
This is for use on kernels that do not have MPLS GSO support.
Thanks to Joe Stringer for his work on this.
v2.30
* As suggested by Jesse Gross
- Use skb_cow_head in push_mpls to ensure there is sufficient headroom for
skb_push
- Call make_writable with skb->mac_len instead of skb->mac_len + MPLS_HLEN
in push_mpls as only the first skb->mac_len bytes of existing packet data
are modified.
- Rename skb_mac_header_end as mac_header_end, this seems
to be a more appropriate name for a local function.
- Remove OVS_CSUM_COMPLETE code from set_ethertype().
Inside OVS the ethernet header is not covered by OVS_CSUM_COMPLETE.
- Use __skb_pull() instead of skb_pull() in pop_mpls()
- Decrement and decrement skb->mac_len when poping and pushing VLAN tags.
Previously mac_len was reset, but this would result in forgetting
the MPLS label stack.
- Remove spurious comment from before do_execute_actions().
- Move OVS_KEY_ATTR_MPLS attribute to its final, upstreamable, location.
- Correct ethertype check for OVS_ACTION_ATTR_POP_MPLS case in
validate_and_copy_actions() to check for MPLS ethertypes rather than
ETH_P_IP.
- Rewrite tracking of eth types used to verify actions in the presence
of sample actions. There is a large comment above struct eth_types
describing the new implementation.
v2.29
* Break include/ and lib/ portions of the patch out into a
separate patch "datapath: Add basic MPLS support to kernel"
* Update for new MPLS GSO scheme
- skb->protocol is set to the new ethertype of the packet
on MPLS push and pop
- When pushing the first MPLS LSE onto a previously non-MPLS
packet set skb->inner_protocol to the original ethertype.
- skb->inner_protocol may be used by the network stack
for GSO of the inner-packet.
* Drop const from ethertype parameter of set_ethertype.
This appears to be a legacy of this parameter being a pointer.
* Pass the ethertype patrameter of pop_mpls as a value rather
than a pointer.
v2.28
* Kernel Datapath changes as suggested by Jarno Rajahalme
+ Correct the logic introduced in v2.27 to set the network_header
to after the MPLS label stack in the case of an MPLS packet.
- Increment stack_len offset so that label stacks of depth greater
than two do not cause an infinite loop.
- Correct offset passed to check_header to include skb->mac len
v2.27
* Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross:
+ Previously the mac_len and network_header of an skb corresponded
to the end of the L2 header. To support GSO, just before transmission,
do_output, with the results as follows:
Input: non-MPLS skb: Output: network header and mac_len correspond
to the beginning of the L3 headers
Input: MPLS: Output: network header and mac_len correspond to the
end of the L2 headers.
This is somewhat confusing.
+ The new scheme is as follows:
- The mac_len always corresponds to the end of the L2 header.
- The network header always corresponds to the beginning of the
L3 header.
+ Note that in the case of MPLS output the end of the L2 headers and the
beginning of the L3 headers will differ.
* Remove unused declaration of skb_cb_mpls_stack()
v2.26
* Rebase on master
* Kernel Datapath changes as suggested by Jarno Rajahalme
- Use skb_network_header() instead of skb_mac_header() to locate
the ethertype to set in set_ethertype() as the latter will
be wrong in the presence of VLAN tags. This resolves
a regression introduced in v2.24.
- Enhance comment in do_output()
- do_execute_actions(): Do not alter mpls_stack_depth if
a MPLS push or pop action fail. This is achieved by altering
mpls_stack_depth at the end of push_mpls() and pop_mpls().
v2.25
* Rebase on master
* Pass big-endian value as the last argument of eth_types_set() in
validate_and_copy_actions__()
* Use revised GSO support as provided by the patch series
"[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS"
- Set skb->mac_len to the length of the l2 header + MPLS stack length
- Update skb->network_header accordingly
- Set skb->encapsulated_features
v2.24
* Use skb_mac_header() in set_ethertype()
* Set skb->encapsulation in set_ethertype() to support MPLS GSO.
Also add a note about the other requirements for MPLS GSO.
MPLS GSO support will be posted as a patch net-next (Linux mainline)
"MPLS: Add limited GSO support"
* Do not add ETH_TYPE_MIN, it is no longer used
v2.23
* As suggested by Jesse Gross:
- Verify the current ethernet type when validating sample actions
both for the taken and not-taken path if the sample action.
- Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of
struct ovs_key_mpls but that an implementation may restrict
the length it accepts.
- Restrict the array length of the OVS_KEY_ATTR_MPLS to one.
+ Don't add ovs_flow_verify_key_len as it was added to
handle attributes whose values are arrays but there are
no attributes with values that are arrays (of length greater than one).
v2.22
* As suggested by Jesse Gross:
- Fix sparse warning in validate_and_copy_actions()
I have no idea why sparse doesn't show this up this on my system.
- Remove call to skb_cow_head() from push_mpls() as it
is already covered by a call to make_writable()
- Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len()
- Disallow set actions on l2.5+ data and MPLS push and pop actions
after an MPLS pop action as there is no verification that the packet
is actually of the new ethernet type. This may later be supported
using recirculation or by other means.
- Do not add spurious debuging message to ovs_flow_cmd_new_or_set()
v2.21
* As suggested by Jesse Gross:
- Verify that l3 and l4 actions always always occur prior to
a push_mpls action and use the network header pointer of an skb
to track the top of the MPLS stack. This avoids adding an l2_size
element to the skb callback.
v2.20
* As suggested by Jesse Gross:
- Do not add ovs_dp_ioctl_hook
+ This appears to be garbage from a rebase
- Do not add skb_cb_set_l2_size. Instead set OVS_CB(skb)->l2_size
in ovs_flow_extract().
- Do not free skb on error in push_mpls(), it is freed in the caller
- Call skb_reset_mac_len() in pop_mpls() and push_mpls()
- Update checksums in pop_mpls(), push_mpls() and set_mpls().
- Rename skb_cb_mpls_bos() as skb_cb_mpls_stack().
It returns the top not the bottom of the stack.
- Track the current eth_type in validate_and_copy_actions
which is initially the eth_type of the flow and may be modified
by push_mpls and pop_mpls actions. Use this to correctly validate
mpls_set actions. This is to allow mpls_set actions to be applied
to a non-MPLS frame after an mpls_push action (although ovs-vswitchd
doesn't currently do that).
Also:
+ Remove the check of the eth_type in set_mpls() as the new validation
scheme should ensure it cannot be incorrect.
+ Use the current eth_type to validate mpls_pop actions and remove
the eth_type check from pop_mpls().
- Move OVS_KEY_ATTR_MPLS to non-upstream group in ovs_key_lens
- Remove unnecessary memset of mpls_key in ovs_flow_to_nlattrs()
- Make a union of the mpls and ip elements of struct sw_flow_key.
Currently the code stops parsing after an MPLS header so it is
not possible for the ip and mpls elements to be used simultaneously
and some space can be saved by using a union.
- Allow an array of MPLS key attributes
+ Currently all but the first element is ignored
+ User-space needs to be updated to accept more than one element,
currently it will treat their presence as an error
- Do not update network header in ovs_flow_extract() for after parsing
the MPLS stack as it is never used because no l3+ processing
occurs on MPLS frames.
- Allow multiple MPLS entries in a match by allowing the OVS_KEY_ATTR_MPLS
to be an array of struct ovs_key_mpls with at least one entry.
Currently only one entry is used which is byte-for-byte compatible with
the previous scheme of having OVS_KEY_ATTR_MPLS as a struct
ovs_key_mpls.
* Make skb writable in pop_mpls(), push_mpls() and set_mpls().
v2.18 - v2.19
* No change
v2.17
* As suggested by Ben Pfaff
- Use consistent terminology for MPLS.
+ Consistently refer to the MPLS component of a packet as the
MPLS label stack and entries in the stack as MPLS label stack entries
(LSE). An MPLS label is a component of an MPLS label stack entry.
The other components are the traffic class (TC), time to live (TTL)
and bottom of stack (BoS) bit.
- Rename compose_.*mpls_ functions as execute_.*mpls_
v2.16
* No change
v2.15
* As suggested by Ben Pfaff
- Use OVS_ACTION_SET to set OVS_KEY_ATTR_MPLS instead of
OVS_ACTION_ATTR_SET_MPLS
v2.14
* Remove include/linux/openvswitch.h portion which added add
new key and action attributes. This
now present in "User-Space MPLS actions and matches"
which is now a dependency of this patch
v2.13
* As suggested by Jarno Rajahalme
- Rename mpls_bos element of ovs_skb_cb as l2_size as it is set and used
regardless of if an MPLS stack is present or not. Update the name of
helper functions and documentation accordingly.
- Ensure that skb_cb_mpls_bos() never returns NULL
* Correct endieness in eth_p_mpls()
v2.12
* Update skb and network header on MPLS extraction in ovs_flow_extract()
* Use NULL in skb_cb_mpls_bos()
* Add eth_p_mpls helper
v2.10 - v2.11
* No change
v2.9
* datapath: Always update the mpls bos if vlan_pop is successful
Regardless of the details of how a successful
vlan_pop is achieved, the mpls bos needs to be updated.
Without this fix it has been observed that the following
results in malformed packets
v2.8
* No change
v2.7
* Rebase
v2.6
* As suggested by Yamahata-san
- Do not guard against label == 0 for
OVS_ACTION_ATTR_SET_MPLS in validate_actions().
A label of 0 is valid
- Remove comment stupulating that if
the top_label element of struct sw_flow_key is 0 then
there is no MPLS label. An MPLS label of 0 is valid
and the correct check if ethertype is
ntohs(ETH_TYPE_MPLS) or ntohs(ETH_TYPE_MPLS_MCAST)
v2.4 - v2.5
* No change
v2.3
* s/mpls_stack/mpls_bos/
This is in keeping with the naming used in the OpenFlow 1.3 specification
v2.2
* Call skb_reset_mac_header() in skb_cb_set_mpls_stack()
eth_hdr(skb) is non-NULL when called in skb_cb_set_mpls_stack().
* Add a call to skb_cb_set_mpls_stack() in ovs_packet_cmd_execute().
I apologise that I have mislaid my notes on this but
it avoids a kernel panic. I can investigate again if necessary.
* Use struct ovs_action_push_mpls instead of
__be16 to decode OVS_ACTION_ATTR_PUSH_MPLS in validate_actions(). This is
consistent with the data format for the attribute.
* Indentation fix in skb_cb_mpls_stack(). [cosmetic]
v2.1
* Manual rebase
---
datapath/Modules.mk | 1 +
datapath/actions.c | 125 ++++++++++-
datapath/datapath.c | 254 ++++++++++++++++++++---
datapath/datapath.h | 9 +
datapath/flow.c | 58 +++++-
datapath/flow.h | 17 +-
datapath/linux/compat/gso.c | 50 ++++-
datapath/linux/compat/gso.h | 39 ++++
datapath/linux/compat/include/linux/netdevice.h | 12 --
datapath/linux/compat/netdevice.c | 28 ---
datapath/mpls.h | 15 ++
datapath/tunnel.c | 1 +
datapath/vport-netdev.c | 44 +++-
include/linux/openvswitch.h | 7 +-
14 files changed, 572 insertions(+), 88 deletions(-)
create mode 100644 datapath/mpls.h
diff --git a/datapath/Modules.mk b/datapath/Modules.mk
index 2ce8888..ad19807 100644
--- a/datapath/Modules.mk
+++ b/datapath/Modules.mk
@@ -26,6 +26,7 @@ openvswitch_headers = \
compat.h \
datapath.h \
flow.h \
+ mpls.h \
tunnel.h \
vlan.h \
vport.h \
diff --git a/datapath/actions.c b/datapath/actions.c
index 0a2def6..99e02cf 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -34,6 +34,8 @@
#include "checksum.h"
#include "datapath.h"
+#include "gso.h"
+#include "mpls.h"
#include "vlan.h"
#include "vport.h"
@@ -48,6 +50,109 @@ static int make_writable(struct sk_buff *skb, int write_len)
return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
}
+/* The end of the mac header.
+ *
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be berfore the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static unsigned char *mac_header_end(const struct sk_buff *skb)
+{
+ return skb_mac_header(skb) + skb->mac_len;
+}
+
+static void set_ethertype(struct sk_buff *skb, __be16 ethertype, bool inner)
+{
+ struct ethhdr *hdr;
+ if (inner)
+ /* mac_header_end() is used to locate the ethertype
+ * field correctly in the presence of VLAN tags. */
+ hdr = (struct ethhdr *)(mac_header_end(skb) - ETH_HLEN);
+ else
+ hdr = (struct ethhdr *)(skb_mac_header(skb));
+ hdr->h_proto = ethertype;
+}
+
+/* Push MPLS after the ethernet header. We blindly ignore any other tags,
+ * assuming that actions are ordered correctly. */
+static int push_mpls(struct sk_buff *skb,
+ const struct ovs_action_push_mpls *mpls)
+{
+ __be32 *new_mpls_lse;
+ int err;
+
+ if (skb_cow_head(skb, MPLS_HLEN) < 0)
+ return -ENOMEM;
+
+ err = make_writable(skb, skb->mac_len);
+ if (unlikely(err))
+ return err;
+
+ skb_push(skb, MPLS_HLEN);
+ memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+ ETH_HLEN);
+ skb_reset_mac_header(skb);
+
+ new_mpls_lse = (__be32 *)(skb_mac_header(skb) + ETH_HLEN);
+ *new_mpls_lse = mpls->mpls_lse;
+
+ if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
+ skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+ MPLS_HLEN, 0));
+
+ set_ethertype(skb, mpls->mpls_ethertype, false);
+ if (skb->protocol != htons(ETH_P_8021Q))
+ skb->protocol = mpls->mpls_ethertype;
+ return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+ int err;
+
+ err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+ if (unlikely(err))
+ return err;
+
+ if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
+ skb->csum = csum_sub(skb->csum,
+ csum_partial(mac_header_end(skb),
+ MPLS_HLEN, 0));
+
+ memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+ skb->mac_len);
+
+ __skb_pull(skb, MPLS_HLEN);
+ skb_reset_mac_header(skb);
+
+ set_ethertype(skb, ethertype, true);
+ if (skb->protocol != htons(ETH_P_8021Q))
+ skb->protocol = ethertype;
+ return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+ __be32 *stack = (__be32 *)mac_header_end(skb);
+ int err;
+
+ err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+ if (unlikely(err))
+ return err;
+
+ if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) {
+ __be32 diff[] = { ~(*stack), *mpls_lse };
+ skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+ ~skb->csum);
+ }
+
+ *stack = *mpls_lse;
+
+ return 0;
+}
+
/* remove VLAN header from packet and update csum accordingly. */
static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
{
@@ -70,7 +175,7 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
vlan_set_encap_proto(skb, vhdr);
skb->mac_header += VLAN_HLEN;
- skb_reset_mac_len(skb);
+ skb->mac_len -= VLAN_HLEN;
return 0;
}
@@ -115,6 +220,9 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
if (!__vlan_put_tag(skb, current_tag))
return -ENOMEM;
+ /* update mac_len for mac_header_end() */
+ skb->mac_len += VLAN_HLEN;
+
if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
skb->csum = csum_add(skb->csum, csum_partial(skb->data
+ (2 * ETH_ALEN), VLAN_HLEN, 0));
@@ -469,6 +577,10 @@ static int execute_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_UDP:
err = set_udp(skb, nla_data(nested_attr));
break;
+
+ case OVS_KEY_ATTR_MPLS:
+ err = set_mpls(skb, nla_data(nested_attr));
+ break;
}
return err;
@@ -504,6 +616,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
output_userspace(dp, skb, a);
break;
+ case OVS_ACTION_ATTR_PUSH_MPLS:
+ err = push_mpls(skb, nla_data(a));
+ break;
+
+ case OVS_ACTION_ATTR_POP_MPLS:
+ err = pop_mpls(skb, nla_get_be16(a));
+ break;
+
case OVS_ACTION_ATTR_PUSH_VLAN:
err = push_vlan(skb, nla_data(a));
if (unlikely(err)) /* skb already freed. */
@@ -577,6 +697,9 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
goto out_loop;
}
+ /* Needed for inner protocol compatibility on older kernels. */
+ ovs_skb_set_inner_protocol(skb, skb->protocol);
+
OVS_CB(skb)->tun_key = NULL;
error = do_execute_actions(dp, skb, acts->actions,
acts->actions_len, false);
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 1de50c1..9ea5bc5 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -57,6 +57,8 @@
#include "checksum.h"
#include "datapath.h"
#include "flow.h"
+#include "gso.h"
+#include "mpls.h"
#include "vlan.h"
#include "tunnel.h"
#include "vport-internal_dev.h"
@@ -551,18 +553,132 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_off
a->nla_len = sfa->actions_len - st_offset;
}
-static int validate_and_copy_actions(const struct nlattr *attr,
+#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
+
+/* struct eth_types - possible eth types
+ * @types: provides storage for the possible eth types.
+ * @start: is the index of the first entry of types which is possible.
+ * @end: is the index of the last entry of types which is possible.
+ * @cursor: is the index of the entry which should be updated if an action
+ * changes the eth type.
+ *
+ * Due to the sample action there may be multiple possible eth types.
+ * In order to correctly validate actions all possible types are tracked
+ * and verified. This is done using struct eth_types.
+ *
+ * Initially start, end and cursor should be 0, and the first element of
+ * types should be set to the eth type of the flow.
+ *
+ * When an action changes the eth type then the values of start and end are
+ * updated to the value of cursor. The new type is stored at types[cursor].
+ *
+ * When entering a sample action the start and cursor values are saved. The
+ * value of cursor is set to the value of end plus one.
+ *
+ * When leaving a sample action the start and cursor values are restored to
+ * their saved values.
+ *
+ * An example follows.
+ *
+ * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
+ *
+ * 0. Initial state:
+ * types = { original_eth_type }
+ * start = end = cursor = 0;
+ *
+ * 1. pop_mpls(A)
+ * a. Check types from start (0) to end (0) inclusive
+ * i.e. Check against original_eth_type
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = A
+ * New state:
+ * types = { A }
+ * start = end = cursor = 0;
+ *
+ * 2. Enter first sample()
+ * a. Save start and cursor
+ * b. Set cursor = end + 1
+ * New state:
+ * types = { A }
+ * start = end = 0;
+ * cursor = 1;
+ *
+ * 3. pop_mpls(B)
+ * a. Check types from start (0) to end (0)
+ * i.e: Check against A
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = B
+ * New state:
+ * types = { A, B }
+ * start = end = cursor = 1;
+ *
+ * 4. Leave first sample()
+ * a. Restore start and cursor to the values when entering 2.
+ * New state:
+ * types = { A, B }
+ * start = cursor = 0;
+ * end = 1;
+ *
+ * 5. Enter second sample()
+ * a. Save start and cursor
+ * b. Set cursor = end + 1
+ * New state:
+ * types = { A, B }
+ * start = 0;
+ * end = 1;
+ * cursor = 2;
+ *
+ * 6. pop_mpls(C)
+ * a. Check types from start (0) to end (1) inclusive
+ * i.e: Check against A and B
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = C
+ * New state:
+ * types = { A, B, C }
+ * start = end = cursor = 2;
+ *
+ * 7. Leave second sample()
+ * a. Restore start and cursor to the values when entering 5.
+ * New state:
+ * types = { A, B, C }
+ * start = cursor = 0;
+ * end = 2;
+ *
+ * 8. pop_mpls(D)
+ * a. Check types from start (0) to end (2) inclusive
+ * i.e: Check against A, B and C
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = D
+ * New state:
+ * types = { D } // Trailing entries of type are no longer used end = 0
+ * start = end = cursor = 0;
+ */
+struct eth_types {
+ int start, end, cursor;
+ __be16 types[MAX_ETH_TYPES];
+};
+
+static void eth_types_set(struct eth_types *types, __be16 type)
+{
+ types->start = types->end = types->cursor;
+ types->types[types->cursor] = type;
+}
+
+static int validate_and_copy_actions__(const struct nlattr *attr,
const struct sw_flow_key *key, int depth,
- struct sw_flow_actions **sfa);
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types);
static int validate_and_copy_sample(const struct nlattr *attr,
const struct sw_flow_key *key, int depth,
- struct sw_flow_actions **sfa)
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types)
{
const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
const struct nlattr *probability, *actions;
const struct nlattr *a;
int rem, start, err, st_acts;
+ int saved_eth_types_start, saved_eth_types_cursor;
memset(attrs, 0, sizeof(attrs));
nla_for_each_nested(a, attr, rem) {
@@ -593,22 +709,38 @@ static int validate_and_copy_sample(const struct nlattr *attr,
if (st_acts < 0)
return st_acts;
- err = validate_and_copy_actions(actions, key, depth + 1, sfa);
+ /* Save and update eth_types cursor and start. Please see the
+ * comment for struct eth_types for a discussion of this.
+ */
+ saved_eth_types_start = eth_types->start;
+ saved_eth_types_cursor = eth_types->cursor;
+ eth_types->cursor = eth_types->end + 1;
+ if (eth_types->cursor == MAX_ETH_TYPES)
+ return -EINVAL;
+
+ err = validate_and_copy_actions__(actions, key, depth + 1, sfa,
+ eth_types);
if (err)
return err;
+ /* Restore eth_types cursor and start. Please see the
+ * comment for struct eth_types for a discussion of this.
+ */
+ eth_types->cursor = saved_eth_types_cursor;
+ eth_types->start = saved_eth_types_start;
+
add_nested_action_end(*sfa, st_acts);
add_nested_action_end(*sfa, start);
return 0;
}
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port(const struct sw_flow_key *flow_key, __be16 eth_type)
{
- if (flow_key->eth.type == htons(ETH_P_IP)) {
+ if (eth_type == htons(ETH_P_IP)) {
if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst)
return 0;
- } else if (flow_key->eth.type == htons(ETH_P_IPV6)) {
+ } else if (eth_type == htons(ETH_P_IPV6)) {
if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst)
return 0;
}
@@ -642,7 +774,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
static int validate_set(const struct nlattr *a,
const struct sw_flow_key *flow_key,
struct sw_flow_actions **sfa,
- bool *set_tun)
+ bool *set_tun, struct eth_types *eth_types)
{
const struct nlattr *ovs_key = nla_data(a);
int key_type = nla_type(ovs_key);
@@ -679,9 +811,12 @@ static int validate_set(const struct nlattr *a,
return err;
break;
- case OVS_KEY_ATTR_IPV4:
- if (flow_key->eth.type != htons(ETH_P_IP))
- return -EINVAL;
+ case OVS_KEY_ATTR_IPV4: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (eth_types->types[i] != htons(ETH_P_IP))
+ return -EINVAL;
if (!flow_key->ip.proto)
return -EINVAL;
@@ -694,10 +829,14 @@ static int validate_set(const struct nlattr *a,
return -EINVAL;
break;
+ }
- case OVS_KEY_ATTR_IPV6:
- if (flow_key->eth.type != htons(ETH_P_IPV6))
- return -EINVAL;
+ case OVS_KEY_ATTR_IPV6: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (eth_types->types[i] != htons(ETH_P_IPV6))
+ return -EINVAL;
if (!flow_key->ip.proto)
return -EINVAL;
@@ -713,18 +852,37 @@ static int validate_set(const struct nlattr *a,
return -EINVAL;
break;
+ }
+
+ case OVS_KEY_ATTR_TCP: {
+ int i;
- case OVS_KEY_ATTR_TCP:
if (flow_key->ip.proto != IPPROTO_TCP)
return -EINVAL;
- return validate_tp_port(flow_key);
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (validate_tp_port(flow_key, eth_types->types[i]))
+ return -EINVAL;
+ }
- case OVS_KEY_ATTR_UDP:
+ case OVS_KEY_ATTR_UDP: {
+ int i;
if (flow_key->ip.proto != IPPROTO_UDP)
return -EINVAL;
- return validate_tp_port(flow_key);
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (validate_tp_port(flow_key, eth_types->types[i]))
+ return -EINVAL;
+ }
+
+ case OVS_KEY_ATTR_MPLS: {
+ int i;
+
+ for (i = eth_types->start; i < eth_types->end; i++)
+ if (!eth_p_mpls(eth_types->types[i]))
+ return -EINVAL;
+ break;
+ }
default:
return -EINVAL;
@@ -768,10 +926,10 @@ static int copy_action(const struct nlattr *from,
return 0;
}
-static int validate_and_copy_actions(const struct nlattr *attr,
- const struct sw_flow_key *key,
- int depth,
- struct sw_flow_actions **sfa)
+static int validate_and_copy_actions__(const struct nlattr *attr,
+ const struct sw_flow_key *key, int depth,
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types)
{
const struct nlattr *a;
int rem, err;
@@ -784,6 +942,8 @@ static int validate_and_copy_actions(const struct nlattr *attr,
static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+ [OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+ [OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
[OVS_ACTION_ATTR_POP_VLAN] = 0,
[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -814,6 +974,33 @@ static int validate_and_copy_actions(const struct nlattr *attr,
return -EINVAL;
break;
+ case OVS_ACTION_ATTR_PUSH_MPLS: {
+ const struct ovs_action_push_mpls *mpls = nla_data(a);
+ if (!eth_p_mpls(mpls->mpls_ethertype))
+ return -EINVAL;
+ eth_types_set(eth_types, mpls->mpls_ethertype);
+ break;
+ }
+
+ case OVS_ACTION_ATTR_POP_MPLS: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (!eth_p_mpls(eth_types->types[i]))
+ return -EINVAL;
+
+ /* Disallow subsequent L2.5+ set and mpls_pop actions
+ * as there is no check here to ensure that the new
+ * eth_type is valid and thus set actions could
+ * write off the end of the packet or otherwise
+ * corrupt it.
+ *
+ * Support for these actions is planned using packet
+ * recirculation.
+ */
+ eth_types_set(eth_types, htons(0));
+ break;
+ }
case OVS_ACTION_ATTR_POP_VLAN:
break;
@@ -827,13 +1014,14 @@ static int validate_and_copy_actions(const struct nlattr *attr,
break;
case OVS_ACTION_ATTR_SET:
- err = validate_set(a, key, sfa, &skip_copy);
+ err = validate_set(a, key, sfa, &skip_copy, eth_types);
if (err)
return err;
break;
case OVS_ACTION_ATTR_SAMPLE:
- err = validate_and_copy_sample(a, key, depth, sfa);
+ err = validate_and_copy_sample(a, key, depth, sfa,
+ eth_types);
if (err)
return err;
skip_copy = true;
@@ -855,6 +1043,20 @@ static int validate_and_copy_actions(const struct nlattr *attr,
return 0;
}
+static int validate_and_copy_actions(const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **sfa)
+{
+ struct eth_types eth_type = {
+ .start = 0,
+ .end = 0,
+ .cursor = 0,
+ .types = { key->eth.type, },
+ };
+
+ return validate_and_copy_actions__(attr, key, 0, sfa, ð_type);
+}
+
static void clear_stats(struct sw_flow *flow)
{
flow->used = 0;
@@ -918,7 +1120,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(acts))
goto err_flow_free;
- err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, 0, &acts);
+ err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, &acts);
rcu_assign_pointer(flow->sf_acts, acts);
if (err)
goto err_flow_free;
@@ -1281,7 +1483,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
ovs_flow_key_mask(&masked_key, &key, &mask);
error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
- &masked_key, 0, &acts);
+ &masked_key, &acts);
if (error) {
OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
goto err_kfree;
diff --git a/datapath/datapath.h b/datapath/datapath.h
index eda87fd..68bf9ac 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -38,6 +38,10 @@
#define SAMPLE_ACTION_DEPTH 3
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
+#define HAVE_INNER_PROTOCOL
+#endif
+
/**
* struct dp_stats_percpu - per-cpu packet processing statistics for a given
* datapath.
@@ -102,6 +106,8 @@ struct datapath {
* packet was not received on a tunnel.
* @vlan_tci: Provides a substitute for the skb->vlan_tci field on kernels
* before 2.6.27.
+ * @inner_protocol: Provides a substitute for the skb->inner_protocol field on
+ * kernels before 3.11.
*/
struct ovs_skb_cb {
struct sw_flow *flow;
@@ -114,6 +120,9 @@ struct ovs_skb_cb {
#ifdef NEED_VLAN_FIELD
u16 vlan_tci;
#endif
+#ifndef HAVE_INNER_PROTOCOL
+ __be16 inner_protocol;
+#endif
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
diff --git a/datapath/flow.c b/datapath/flow.c
index 95fea7f..202f7bd 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -43,6 +43,7 @@
#include <net/ipv6.h>
#include <net/ndisc.h>
+#include "mpls.h"
#include "vlan.h"
static struct kmem_cache *flow_cache;
@@ -131,7 +132,8 @@ static bool ovs_match_validate(const struct sw_flow_match *match,
| (1ULL << OVS_KEY_ATTR_ICMP)
| (1ULL << OVS_KEY_ATTR_ICMPV6)
| (1ULL << OVS_KEY_ATTR_ARP)
- | (1ULL << OVS_KEY_ATTR_ND));
+ | (1ULL << OVS_KEY_ATTR_ND)
+ | (1ULL << OVS_KEY_ATTR_MPLS));
if (match->key->phy.in_port == DP_MAX_PORTS &&
match->mask && (match->mask->key.phy.in_port == 0xffff))
@@ -149,6 +151,12 @@ static bool ovs_match_validate(const struct sw_flow_match *match,
mask_allowed |= 1ULL << OVS_KEY_ATTR_ARP;
}
+ if (eth_p_mpls(match->key->eth.type)) {
+ key_expected |= 1ULL << OVS_KEY_ATTR_MPLS;
+ if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+ mask_allowed |= 1ULL << OVS_KEY_ATTR_MPLS;
+ }
+
if (match->key->eth.type == htons(ETH_P_IP)) {
key_expected |= 1ULL << OVS_KEY_ATTR_IPV4;
if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -856,6 +864,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
return -ENOMEM;
skb_reset_network_header(skb);
+ skb_reset_mac_len(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
/* Network layer. */
@@ -932,6 +941,33 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN);
memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
}
+ } else if (eth_p_mpls(key->eth.type)) {
+ size_t stack_len = MPLS_HLEN;
+
+ /* In the presence of an MPLS label stack the end of the L2
+ * header and the beginning of the L3 header differ.
+ *
+ * Advance network_header to the beginning of the L3
+ * header. mac_len corresponds to the end of the L2 header.
+ */
+ while (1) {
+ __be32 lse;
+
+ error = check_header(skb, skb->mac_len + stack_len);
+ if (unlikely(error))
+ return 0;
+
+ memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+ if (stack_len == MPLS_HLEN)
+ memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+ skb_set_network_header(skb, skb->mac_len + stack_len);
+ if (lse & htonl(MPLS_BOS_MASK))
+ break;
+
+ stack_len += MPLS_HLEN;
+ }
} else if (key->eth.type == htons(ETH_P_IPV6)) {
int nh_len; /* IPv6 Header + Extensions */
@@ -1104,6 +1140,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
[OVS_KEY_ATTR_TUNNEL] = -1,
+ [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
};
static bool is_all_zero(const u8 *fp, size_t size)
@@ -1470,6 +1507,17 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
attrs &= ~(1ULL << OVS_KEY_ATTR_ARP);
}
+
+ if (attrs & (1ULL << OVS_KEY_ATTR_MPLS)) {
+ const struct ovs_key_mpls *mpls_key;
+
+ mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+ SW_FLOW_KEY_PUT(match, mpls.top_lse,
+ mpls_key->mpls_lse, is_mask);
+
+ attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
+ }
+
if (attrs & (1ULL << OVS_KEY_ATTR_TCP)) {
const struct ovs_key_tcp *tcp_key;
@@ -1795,6 +1843,14 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey,
arp_key->arp_op = htons(output->ip.proto);
memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN);
memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN);
+ } else if (eth_p_mpls(swkey->eth.type)) {
+ struct ovs_key_mpls *mpls_key;
+
+ nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+ if (!nla)
+ goto nla_put_failure;
+ mpls_key = nla_data(nla);
+ mpls_key->mpls_lse = output->mpls.top_lse;
}
if ((swkey->eth.type == htons(ETH_P_IP) ||
diff --git a/datapath/flow.h b/datapath/flow.h
index 1a3764e..806d9d3 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -71,12 +71,17 @@ struct sw_flow_key {
__be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
__be16 type; /* Ethernet frame type. */
} eth;
- struct {
- u8 proto; /* IP protocol or lower 8 bits of ARP opcode. */
- u8 tos; /* IP ToS. */
- u8 ttl; /* IP TTL/hop limit. */
- u8 frag; /* One of OVS_FRAG_TYPE_*. */
- } ip;
+ union {
+ struct {
+ __be32 top_lse; /* top label stack entry */
+ } mpls;
+ struct {
+ u8 proto; /* IP protocol or lower 8 bits of ARP opcode. */
+ u8 tos; /* IP ToS. */
+ u8 ttl; /* IP TTL/hop limit. */
+ u8 frag; /* One of OVS_FRAG_TYPE_*. */
+ } ip;
+ };
union {
struct {
struct {
diff --git a/datapath/linux/compat/gso.c b/datapath/linux/compat/gso.c
index 43418d3..f957139 100644
--- a/datapath/linux/compat/gso.c
+++ b/datapath/linux/compat/gso.c
@@ -19,6 +19,7 @@
#include <linux/module.h>
#include <linux/if.h>
#include <linux/if_tunnel.h>
+#include <linux/if_vlan.h>
#include <linux/icmp.h>
#include <linux/in.h>
#include <linux/ip.h>
@@ -35,12 +36,20 @@
#include <net/xfrm.h>
#include "gso.h"
+#include "mpls.h"
+#include "vlan.h"
-static __be16 __skb_network_protocol(struct sk_buff *skb)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+__be16 rpl_skb_network_protocol(struct sk_buff *skb)
{
__be16 type = skb->protocol;
+ __be16 inner_proto;
int vlan_depth = ETH_HLEN;
+ inner_proto = ovs_skb_get_inner_protocol(skb);
+ if (eth_p_mpls(skb->protocol) && !eth_p_mpls(inner_proto))
+ type = inner_proto;
+
while (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) {
struct vlan_hdr *vh;
@@ -55,6 +64,43 @@ static __be16 __skb_network_protocol(struct sk_buff *skb)
return type;
}
+struct sk_buff *rpl___skb_gso_segment(struct sk_buff *skb,
+ netdev_features_t features,
+ bool tx_path)
+{
+ struct sk_buff *skb_gso;
+ __be16 type = skb->protocol;
+
+ skb->protocol = skb_network_protocol(skb);
+
+ /* this hack needed to get regular skb_gso_segment() */
+#ifdef HAVE___SKB_GSO_SEGMENT
+#undef __skb_gso_segment
+ skb_gso = __skb_gso_segment(skb, features, tx_path);
+#else
+#undef skb_gso_segment
+ skb_gso = skb_gso_segment(skb, features);
+#endif
+
+ if (!skb_gso || IS_ERR(skb_gso))
+ return skb_gso;
+
+ skb = skb_gso;
+ while (skb) {
+ skb->protocol = type;
+ skb = skb->next;
+ }
+
+ return skb_gso;
+}
+
+struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ return rpl___skb_gso_segment(skb, features, true);
+}
+#endif /* kernel version < 3.11.0 */
+
static struct sk_buff *tnl_skb_gso_segment(struct sk_buff *skb,
netdev_features_t features,
bool tx_path)
@@ -68,7 +114,7 @@ static struct sk_buff *tnl_skb_gso_segment(struct sk_buff *skb,
/* setup whole inner packet to get protocol. */
__skb_pull(skb, mac_offset);
- skb->protocol = __skb_network_protocol(skb);
+ skb->protocol = skb_network_protocol(skb);
/* setup l3 packet to gso, to get around segmentation bug on older kernel.*/
__skb_pull(skb, (pkt_hlen - mac_offset));
diff --git a/datapath/linux/compat/gso.h b/datapath/linux/compat/gso.h
index 44fd213..49ef8e6 100644
--- a/datapath/linux/compat/gso.h
+++ b/datapath/linux/compat/gso.h
@@ -1,6 +1,7 @@
#ifndef __LINUX_GSO_WRAPPER_H
#define __LINUX_GSO_WRAPPER_H
+#include <linux/netdevice.h>
#include <linux/skbuff.h>
#include <net/protocol.h>
@@ -69,4 +70,42 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb)
#define ip_local_out rpl_ip_local_out
int ip_local_out(struct sk_buff *skb);
+
+#ifdef HAVE_INNER_PROTOCOL
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+ __be16 ethertype)
+{
+ skb->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+ return skb->inner_protocol;
+}
+#else
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+ __be16 ethertype) {
+ OVS_CB(skb)->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+ return OVS_CB(skb)->inner_protocol;
+}
+#endif
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+#define skb_network_protocol rpl_skb_network_protocol
+__be16 rpl_skb_network_protocol(struct sk_buff *skb);
+
+#define skb_gso_segment rpl_skb_gso_segment
+struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
+ netdev_features_t features);
+
+#define __skb_gso_segment rpl___skb_gso_segment
+struct sk_buff *rpl___skb_gso_segment(struct sk_buff *skb,
+ netdev_features_t features,
+ bool tx_path);
+#endif /* before 3.11 */
+
#endif
diff --git a/datapath/linux/compat/include/linux/netdevice.h b/datapath/linux/compat/include/linux/netdevice.h
index ba1fc59..0325e78 100644
--- a/datapath/linux/compat/include/linux/netdevice.h
+++ b/datapath/linux/compat/include/linux/netdevice.h
@@ -162,9 +162,6 @@ static inline struct net_device *dev_get_by_index_rcu(struct net *net, int ifind
#endif
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
-#define skb_gso_segment rpl_skb_gso_segment
-struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features);
-
#define netif_skb_features rpl_netif_skb_features
u32 rpl_netif_skb_features(struct sk_buff *skb);
@@ -179,13 +176,4 @@ static inline int rpl_netif_needs_gso(struct sk_buff *skb, int features)
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,3,0)
typedef u32 netdev_features_t;
#endif
-
-#ifndef HAVE___SKB_GSO_SEGMENT
-static inline struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
- netdev_features_t features,
- bool tx_path)
-{
- return skb_gso_segment(skb, features);
-}
-#endif
#endif
diff --git a/datapath/linux/compat/netdevice.c b/datapath/linux/compat/netdevice.c
index f03efde..9a777d6 100644
--- a/datapath/linux/compat/netdevice.c
+++ b/datapath/linux/compat/netdevice.c
@@ -75,32 +75,4 @@ u32 rpl_netif_skb_features(struct sk_buff *skb)
return harmonize_features(skb, protocol, features);
}
}
-
-struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features)
-{
- int vlan_depth = ETH_HLEN;
- __be16 type = skb->protocol;
- __be16 skb_proto;
- struct sk_buff *skb_gso;
-
- while (type == htons(ETH_P_8021Q)) {
- struct vlan_hdr *vh;
-
- if (unlikely(!pskb_may_pull(skb, vlan_depth + VLAN_HLEN)))
- return ERR_PTR(-EINVAL);
-
- vh = (struct vlan_hdr *)(skb->data + vlan_depth);
- type = vh->h_vlan_encapsulated_proto;
- vlan_depth += VLAN_HLEN;
- }
-
- /* this hack needed to get regular skb_gso_segment() */
-#undef skb_gso_segment
- skb_proto = skb->protocol;
- skb->protocol = type;
-
- skb_gso = skb_gso_segment(skb, features);
- skb->protocol = skb_proto;
- return skb_gso;
-}
#endif /* kernel version < 2.6.38 */
diff --git a/datapath/mpls.h b/datapath/mpls.h
new file mode 100644
index 0000000..7eab104
--- /dev/null
+++ b/datapath/mpls.h
@@ -0,0 +1,15 @@
+#ifndef MPLS_H
+#define MPLS_H 1
+
+#include <linux/if_ether.h>
+
+#define MPLS_BOS_MASK 0x00000100
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+ return eth_type == htons(ETH_P_MPLS_UC) ||
+ eth_type == htons(ETH_P_MPLS_MC);
+}
+
+#endif
diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index ef46a69..756e8b6 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -33,6 +33,7 @@
#include "checksum.h"
#include "compat.h"
#include "datapath.h"
+#include "gso.h"
#include "tunnel.h"
#include "vlan.h"
#include "vport.h"
diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
index 06598fa..6a10602 100644
--- a/datapath/vport-netdev.c
+++ b/datapath/vport-netdev.c
@@ -30,6 +30,8 @@
#include "checksum.h"
#include "datapath.h"
+#include "gso.h"
+#include "mpls.h"
#include "vlan.h"
#include "vport-internal_dev.h"
#include "vport-netdev.h"
@@ -279,6 +281,8 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
int mtu = netdev_vport->dev->mtu;
int len;
+ __be16 inner_protocol;
+ bool vlan, mpls;
if (unlikely(packet_length(skb) > mtu && !skb_is_gso(skb))) {
net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
@@ -290,8 +294,17 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
skb->dev = netdev_vport->dev;
forward_ip_summed(skb, true);
- if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
- int features;
+ vlan = mpls = false;
+
+ inner_protocol = ovs_skb_get_inner_protocol(skb);
+ if (eth_p_mpls(skb->protocol) && !eth_p_mpls(inner_protocol))
+ mpls = true;
+
+ if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev))
+ vlan = true;
+
+ if (vlan || mpls) {
+ netdev_features_t features;
features = netif_skb_features(skb);
@@ -299,6 +312,17 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
NETIF_F_UFO | NETIF_F_FSO);
+ /* As of v3.11 the kernel provides an mpls_features field in
+ * struct net_device which allows devices to advertise which
+ * features its supports for MPLS. This value defaults to
+ * NETIF_F_SG and as of writing is not overridden anywhere.
+ * This compatibility code is intended for older kernels which
+ * do not support MPLS GSO and thus do not provide
+ * mpls_features. Thus this code uses NETIF_F_SG directly in
+ * place of mpls_features. */
+ if (mpls)
+ features &= NETIF_F_SG;
+
if (netif_needs_gso(skb, features)) {
struct sk_buff *nskb;
@@ -322,10 +346,12 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
nskb = skb->next;
skb->next = NULL;
- skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
+ if (vlan)
+ skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
if (likely(skb)) {
len += skb->len;
- vlan_set_tci(skb, 0);
+ if (vlan)
+ vlan_set_tci(skb, 0);
dev_queue_xmit(skb);
}
@@ -336,10 +362,12 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
}
tag:
- skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
- if (unlikely(!skb))
- return 0;
- vlan_set_tci(skb, 0);
+ if (vlan) {
+ skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
+ if (unlikely(!skb))
+ return 0;
+ vlan_set_tci(skb, 0);
+ }
}
len = skb->len;
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index a119b14..27d0494 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -282,14 +282,13 @@ enum ovs_key_attr {
OVS_KEY_ATTR_ND, /* struct ovs_key_nd */
OVS_KEY_ATTR_SKB_MARK, /* u32 skb mark */
OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */
+ OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls.
+ * The implementation may restrict
+ * the accepted length of the array. */
#ifdef __KERNEL__
OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
#endif
-
- OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls.
- * The implementation may restrict
- * the accepted length of the array. */
__OVS_KEY_ATTR_MAX
};
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists