[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130521005519.GA14817@verge.net.au>
Date: Tue, 21 May 2013 09:55:21 +0900
From: Simon Horman <horms@...ge.net.au>
To: Jesse Gross <jesse@...ira.com>
Cc: "dev@...nvswitch.org" <dev@...nvswitch.org>,
netdev <netdev@...r.kernel.org>, Ravi K <rkerur@...il.com>,
Isaku Yamahata <yamahata@...inux.co.jp>,
Ben Pfaff <blp@...ira.com>,
Jarno Rajahalme <jarno.rajahalme@....com>
Subject: Re: [PATCH v2.29] datapath: Add basic MPLS support to kernel
On Fri, May 17, 2013 at 04:14:56PM -0700, Jesse Gross wrote:
> On Fri, May 17, 2013 at 12:06 AM, Simon Horman <horms@...ge.net.au> wrote:
> > diff --git a/datapath/actions.c b/datapath/actions.c
> > index 0dac658..ac4423a 100644
> > --- a/datapath/actions.c
> > +++ b/datapath/actions.c
> > +/* The end of the mac header.
> > + *
> > + * For non-MPLS skbs this will correspond to the network header.
> > + * For MPLS skbs it will be berfore the network_header as the MPLS
> > + * label stack lies between the end of the mac header and the network
> > + * header. That is, for MPLS skbs the end of the mac header
> > + * is the top of the MPLS label stack.
> > + */
> > +static inline unsigned char *skb_mac_header_end(const struct sk_buff *skb)
> > +{
> > + return skb_mac_header(skb) + skb->mac_len;
> > +}
>
> This should either be moved into skbuff.h or given a name that makes
> it clear that it is a local function.
I think it is best to keep it local as it is not needed elsewhere at this
time. And its not clear to me that it will ever be needed elsewhere.
I have renamed it mac_header_end().
> > +static void set_ethertype(struct sk_buff *skb, __be16 ethertype)
> > +{
> > + __be16 *skb_ethertype = get_ethertype(skb);
> > + if (*skb_ethertype == ethertype)
> > + return;
> > + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) {
> > + __be16 diff[] = { ~*skb_ethertype, ethertype };
> > + skb->csum = ~csum_partial((char *)diff, sizeof(diff),
> > + ~skb->csum);
> > + }
>
> Inside of OVS the Ethernet header is not covered by CHECKSUM_COMPLETE.
Thanks, I have removed the OVS_CSUM_COMPLETE code from set_ethertype()
> > +static int push_mpls(struct sk_buff *skb,
> > + const struct ovs_action_push_mpls *mpls)
> > +{
> > + __be32 *new_mpls_lse;
> > + int err;
> > +
> > + err = make_writable(skb, skb->mac_len + MPLS_HLEN);
> > + if (unlikely(err))
> > + return err;
> > +
> > + skb_push(skb, MPLS_HLEN);
>
> What happens if there isn't enough headroom?
Good point. How about this?
if (unlikely(skb->data<skb->head))
return -EINVAL;
skb_push(skb, MPLS_HLEN);
> > +static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
> > +{
> > + int err;
> > +
> > + err = make_writable(skb, skb->mac_len + MPLS_HLEN);
> > + if (unlikely(err))
> > + return err;
> > +
> > + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
> > + skb->csum = csum_sub(skb->csum,
> > + csum_partial(skb_mac_header_end(skb),
> > + MPLS_HLEN, 0));
> > +
> > + memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
> > + skb->mac_len);
> > +
> > + skb_pull(skb, MPLS_HLEN);
>
> You could use __skb_pull() here.
Thanks, I have changed skb_pull() to __skb_pull().
> > /* remove VLAN header from packet and update csum accordingly. */
> > static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
> > {
> > @@ -115,6 +229,9 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
> > if (!__vlan_put_tag(skb, current_tag))
> > return -ENOMEM;
> >
> > + /* update mac_len for skb_mac_header_end() */
> > + skb_reset_mac_len(skb);
>
> Doesn't this make us forget the start of the MPLS label stack?
Good point. I have updated this to:
skb->mac_len += VLAN_HLEN;
I have also updated __pop_vlan_tci() to use:
skb->mac_len -= VLAN_HLEN;
Instead of:
skb_reset_mac_len(skb);
> > -/* Execute a list of actions against 'skb'. */
> > +/* Execute a list of actions against 'skb'.
> > + *
> > + * The stack depth is only tracked in the case of a non-MPLS packet
> > + * that becomes MPLS via an MPLS push action. The stack depth
> > + * is passed to do_output() in order to allow it to prepare the
> > + * skb for possible GSO segmentation. */
>
> I don't think this comment applies any more.
Yes, sorry about that. I have removed the above change.
> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> > index 42af315..3a1c203 100644
> > --- a/datapath/datapath.c
> > +++ b/datapath/datapath.c
>
> It's not clear to that using the depth is actually sufficient to
> capture all possible combinations because the more common case is that
> actions are a list, not nested. For example, consider the following
> (invalid) action list on an IP input packet:
>
> push_mpls, sample(pop_mpls), pop_mpls
>
> I don't believe that this will be rejected since by the time we get to
> the second pop_mpls we will have forgotten about the sample action.
My intention when writing the code was that such a case would be rejected
as follows:
1. When the nested pop_mpls is validated the following call is
made with depth = 1:
eth_types_set(eth_types, depth, htons(0));
This sets:
eth_types->depth = 1
eth_types[1] = htons(0);
2. When the second pop_mpls is validated the following
is executed (with the fix that you suggested below):
for (i = 0; i < eth_types->depth; i++)
if (!eth_p_mpls(eth_types->types[i]))
return -EINVAL;
This will return -EINVAL due to the values of eth_type members set in 1.
>
> > @@ -811,6 +869,35 @@ static int validate_and_copy_actions(const struct nlattr *attr,
> > return -EINVAL;
> > break;
> >
> > + case OVS_ACTION_ATTR_PUSH_MPLS: {
> > + const struct ovs_action_push_mpls *mpls = nla_data(a);
> > + if (!eth_p_mpls(mpls->mpls_ethertype))
> > + return -EINVAL;
> > + eth_types_set(eth_types, depth, mpls->mpls_ethertype);
> > + break;
> > + }
> > +
> > + case OVS_ACTION_ATTR_POP_MPLS: {
> > + size_t i;
> > +
> > + for (i = 0; i < eth_types->depth; i++)
> > + if (eth_types->types[i] != htons(ETH_P_IP))
> > + return -EINVAL;
>
> I think the check here should be for MPLS, not IP.
Thanks, I have changed the above check to:
if (!eth_p_mpls(eth_types->types[i]))
> > diff --git a/datapath/flow.c b/datapath/flow.c
> > index 3ce926e..2a86f90 100644
> > --- a/datapath/flow.c
> > +++ b/datapath/flow.c
> > @@ -848,6 +880,9 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> > [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
> > [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
> > [OVS_KEY_ATTR_TUNNEL] = -1,
> > +
> > + /* Not upstream. */
> > + [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
>
> At this point, we can probably treat this patch as the point where the
> MPLS data structures are finalized and upstreamable. Therefore, this
> patch can move the attribute to a final location.
Thanks, I will squash the following incremental change into the next
posting of this patch.
diff --git a/datapath/flow.c b/datapath/flow.c
index 2a86f90..d67d6bf 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -880,8 +880,6 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
[OVS_KEY_ATTR_TUNNEL] = -1,
-
- /* Not upstream. */
[OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
};
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index e890fd8..d90f585 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -287,7 +287,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
#endif
- OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls.
+ OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls.
* The implementation may restrict
* the accepted length of the array. */
__OVS_KEY_ATTR_MAX
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists