[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130613011921.GA32025@verge.net.au>
Date: Thu, 13 Jun 2013 10:19:21 +0900
From: Simon Horman <horms@...ge.net.au>
To: Pravin Shelar <pshelar@...ira.com>
Cc: Jesse Gross <jesse@...ira.com>,
"dev@...nvswitch.org" <dev@...nvswitch.org>,
netdev <netdev@...r.kernel.org>, Ravi K <rkerur@...il.com>,
Isaku Yamahata <yamahata@...inux.co.jp>,
Ben Pfaff <blp@...ira.com>,
Jarno Rajahalme <jarno.rajahalme@....com>,
Joe Stringer <joe@...d.net.nz>
Subject: Re: [PATCH] datapath: Add basic MPLS support to kernel
On Wed, Jun 12, 2013 at 06:14:07PM -0700, Pravin Shelar wrote:
> On Tue, Jun 11, 2013 at 2:26 PM, Jesse Gross <jesse@...ira.com> wrote:
> > On Mon, Jun 10, 2013 at 1:07 AM, Simon Horman <horms@...ge.net.au> wrote:
> >> diff --git a/datapath/actions.c b/datapath/actions.c
> >> index 0dac658..197811a 100644
> >> --- a/datapath/actions.c
> >> +++ b/datapath/actions.c
> >> +static int push_mpls(struct sk_buff *skb,
> >> + const struct ovs_action_push_mpls *mpls)
> >> +{
> >> + __be32 *new_mpls_lse;
> >> + int err;
> >> +
> >> + if (skb_cow_head(skb, MPLS_HLEN) < 0)
> >> + return -ENOMEM;
> >> +
> >> + err = make_writable(skb, skb->mac_len);
> >> + if (unlikely(err))
> >> + return err;
> >> +
> >> + skb_push(skb, MPLS_HLEN);
> >> + memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
> >> + skb->mac_len);
> >> + skb_reset_mac_header(skb);
> >> +
> >> + new_mpls_lse = (__be32 *)mac_header_end(skb);
> >> + *new_mpls_lse = mpls->mpls_lse;
> >> +
> >> + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
> >> + skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
> >> + MPLS_HLEN, 0));
> >> +
> >> + /* If the skb was non-MPLS then record the original protocol
> >> + * in skb->inner_protocol. This is to allow GSO to perform
> >> + * segmentation of the inner-packet.
> >> + */
> >> + if (!eth_p_mpls(skb->protocol))
> >> + skb_set_inner_protocol(skb, skb->protocol);
> >> +
> >> + set_ethertype(skb, mpls->mpls_ethertype);
> >> + if (!vlan_tx_tag_present(skb) && skb->protocol != htons(ETH_P_8021Q)) {
> >> + skb->protocol = mpls->mpls_ethertype;
> >> + }
> >
> > I don't think you need to check for vlan_tx_tag_present here - when
> > doing vlan offloading, skb->protocol reflects the packet as if there
> > was no vlan tag there at all. Also, you shouldn't need the braces
> > there.
> >
> > When looking at this, I had an additional thought: different versions
> > of OpenFlow define the tag order differently and there is some limited
> > flexibility in how they get applied. Obviously, we're assuming that
> > vlans are applied before MPLS here. However, I'm not sure that is
> > actually required and may even simplify things (for example, your MPLS
> > GSO code doesn't do anything special with vlans and it should continue
> > to just work either way). I'm hesitant to suggest changes at this
> > point but it might actually make things cleaner and work with the
> > general model that we already have.
> >
> >> diff --git a/datapath/datapath.c b/datapath/datapath.c
> >> index 42af315..7e0e7a1 100644
> >> --- a/datapath/datapath.c
> >> +++ b/datapath/datapath.c
> >> @@ -551,18 +551,132 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_off
> >> a->nla_len = sfa->actions_len - st_offset;
> >> }
> >>
> >> -static int validate_and_copy_actions(const struct nlattr *attr,
> >> +#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
> >> +
> >> +/* struct eth_types - possible eth types
> >> + * @types: provides storage for the possible eth types.
> >> + * @start: is the index of the first entry of types which is possible.
> >> + * @end: is the index of the last entry of types which is possible.
> >> + * @cursor: is the index of the entry which should be updated if an action
> >> + * changes the eth type.
> >> + *
> >> + * Due to the sample action there may be multiple possible eth types.
> >> + * In order to correctly validate actions all possible types are tracked
> >> + * and verified. This is done using struct eth_types.
> >> + *
> >> + * Initially start, end and cursor should be 0, and the first element of
> >> + * types should be set to the eth type of the flow.
> >> + *
> >> + * When an action changes the eth type then the values of start and end are
> >> + * updated to the value of cursor. The new type is stored at types[cursor].
> >> + *
> >> + * When entering a sample action the start and cursor values are saved. The
> >> + * value of cursor is set to the value of end plus one.
> >> + *
> >> + * When leaving a sample action the start and cursor values are restored to
> >> + * their saved values.
> >> + *
> >> + * An example follows.
> >> + *
> >> + * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
> >> + *
> >> + * 0. Initial state:
> >> + * types = { original_eth_type }
> >> + * start = end = cursor = 0;
> >> + *
> >> + * 1. pop_mpls(A)
> >> + * a. Check types from start (0) to end (0) inclusive
> >> + * i.e. Check against original_eth_type
> >> + * b. Set start = end = cursor
> >> + * c. Set types[cursor] = A
> >> + * New state:
> >> + * types = { A }
> >> + * start = end = cursor = 0;
> >> + *
> >> + * 2. Enter first sample()
> >> + * a. Save start and cursor
> >> + * b. Set cursor = end + 1
> >> + * New state:
> >> + * types = { A }
> >> + * start = end = 0;
> >> + * cursor = 1;
> >> + *
> >> + * 3. pop_mpls(B)
> >> + * a. Check types from start (0) to end (0)
> >> + * i.e: Check against A
> >> + * b. Set start = end = cursor
> >> + * c. Set types[cursor] = B
> >> + * New state:
> >> + * types = { A, B }
> >> + * start = end = cursor = 1;
> >> + *
> >> + * 4. Leave first sample()
> >> + * a. Restore start and cursor to the values when entering 2.
> >> + * New state:
> >> + * types = { A, B }
> >> + * start = cursor = 0;
> >> + * end = 1;
> >> + *
> >> + * 5. Enter second sample()
> >> + * a. Save start and cursor
> >> + * b. Set cursor = end + 1
> >> + * New state:
> >> + * types = { A, B }
> >> + * start = 0;
> >> + * end = 1;
> >> + * cursor = 2;
> >> + *
> >> + * 6. pop_mpls(C)
> >> + * a. Check types from start (0) to end (1) inclusive
> >> + * i.e: Check against A and B
> >> + * b. Set start = end = cursor
> >> + * c. Set types[cursor] = C
> >> + * New state:
> >> + * types = { A, B, C }
> >> + * start = end = cursor = 2;
> >> + *
> >> + * 7. Leave second sample()
> >> + * a. Restore start and cursor to the values when entering 5.
> >> + * New state:
> >> + * types = { A, B, C }
> >> + * start = cursor = 0;
> >> + * end = 2;
> >> + *
> >> + * 8. pop_mpls(D)
> >> + * a. Check types from start (0) to end (2) inclusive
> >> + * i.e: Check against A, B and C
> >> + * b. Set start = end = cursor
> >> + * c. Set types[cursor] = D
> >> + * New state:
> >> + * types = { D } // Trailing entries of type are no longer used end = 0
> >> + * start = end = cursor = 0;
> >> + */
> >
> > The algorithm looks good to me now, thanks for working through it and
> > for documenting it.
> >
> >> +static int validate_and_copy_actions__(const struct nlattr *attr,
> >> + const struct sw_flow_key *key, int depth,
> >> + struct sw_flow_actions **sfa,
> >> + struct eth_types *eth_types)
> > [...]
> >> + case OVS_ACTION_ATTR_POP_MPLS: {
> >> + size_t i;
> >
> > size_t seems a little bit odd to me here, maybe just use an int?
> >
> >> + for (i = eth_types->start; i <= eth_types->end; i++)
> >> + if (!eth_p_mpls(eth_types->types[i]))
> >> + return -EINVAL;
> >> +
> >> + /* Disallow subsequent L2.5+ set and mpls_pop actions
> >> + * as there is no check here to ensure that the new
> >> + * eth_type is valid and thus set actions could
> >> + * write off the end of the packet or otherwise
> >> + * corrupt it.
> >> + *
> >> + * Support for these actions that after mpls_pop
> >> + * using packet recirculation is planned.
> >> + * are planned to be supported using using packet
> >> + * recirculation.
> >
> > I think there is an editing error in the above comment.
> >
> >> diff --git a/datapath/datapath.h b/datapath/datapath.h
> >> index ad59a3a..e47493a 100644
> >> --- a/datapath/datapath.h
> >> +++ b/datapath/datapath.h
> >> +#ifdef HAVE_INNER_PROTOCOL
> >> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype)
> >> +{
> >> + skb->inner_protocol = ethertype;
> >> +}
> >> +
> >> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
> >> +{
> >> + return skb->inner_protocol;
> >> +}
> >> +
> >> +static inline bool kernel_supports_mpls_gso(void)
> >> +{
> >> + return true;
> >> +}
> >> +#else
> >> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype) {
> >> + OVS_CB(skb)->inner_protocol = ethertype;
> >> +}
> >> +
> >> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
> >> +{
> >> + return OVS_CB(skb)->inner_protocol;
> >> +}
> >> +
> >> +static inline bool kernel_supports_mpls_gso(void)
> >> +{
> >> + return false;
> >> +}
> >> +#endif
> >
> > Can we move these functions compat.h, prefix them with ovs_, and make
> > it clear that they are part of OVS and not skbuff.h?
> >
> >> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> >> index 8c93e18..2d870aa 100644
> >> --- a/datapath/tunnel.c
> >> +++ b/datapath/tunnel.c
> >> @@ -150,6 +150,16 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
> >>
> >> if (skb_is_gso(skb)) {
> >> struct sk_buff *nskb;
> >> + __be16 mpls_protocol = htons(0);
> >> +
> >> + /* Swap the protocol so we can reuse the existing
> >> + * skb_gso_segment() function to handle L3 GSO. We will
> >> + * restore this later. */
> >> + if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
> >> + !eth_p_mpls(skb_get_inner_protocol(skb))) {
> >> + mpls_protocol = skb->protocol;
> >> + skb->protocol = skb_get_inner_protocol(skb);
> >> + }
> >>
> >> nskb = __skb_gso_segment(skb, 0, false);
> >> if (IS_ERR(nskb)) {
> >> @@ -159,6 +169,13 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
> >>
> >> consume_skb(skb);
> >> skb = nskb;
> >> +
> >> + if (mpls_protocol != htons(0)) {
> >> + do {
> >> + nskb->protocol = mpls_protocol;
> >> + nskb = nskb->next;
> >> + } while (nskb);
> >> + }
> >> } else if (get_ip_summed(skb) == OVS_CSUM_PARTIAL) {
> >> /* Pages aren't locked and could change at any time.
> >> * If this happens after we compute the checksum, the
> >
> > Can we push this completely into the skb_gso_segment() compatibility
> > code? It's both nicer and may make the interactions with the vlan code
> > less confusing.
> >
> >> diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
> >> index 4e7342c..db3284f 100644
> >> --- a/datapath/vport-netdev.c
> >> +++ b/datapath/vport-netdev.c
> >> @@ -309,8 +310,31 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
> >> skb->dev = netdev_vport->dev;
> >> forward_ip_summed(skb, true);
> >>
> >> + vlan = mpls = false;
> >> +
> >> + if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
> >> + !eth_p_mpls(skb_get_inner_protocol(skb)))
> >> + mpls = true;
> >> +
> >> + /* If we are segmenting a VLAN packet, then we don't need to handle
> >> + * MPLS segmentation, because MPLS is part of the extended L2 header
> >> + * and the kernel already knows how to handle this. */
> >> if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
> >> - int features;
> >> + mpls = false;
> >> + vlan = true;
> >> + }
> >
> > Don't we still need to save and restore the MPLS skb->protocol field
> > even if we are also processing vlans?
> >
> >> @@ -341,10 +376,14 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
> >> nskb = skb->next;
> >> skb->next = NULL;
> >>
> >> - skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> >> + if (vlan)
> >> + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> >> if (likely(skb)) {
> >> len += skb->len;
> >> - vlan_set_tci(skb, 0);
> >> + if (mpls)
> >> + skb->protocol = mpls_protocol;
> >> + if (vlan)
> >> + vlan_set_tci(skb, 0);
> >> dev_queue_xmit(skb);
> >> }
> >
> > The order of setting the protocol field is different from the one
> > below and I think this one isn't right because it means that the MPLS
> > protocol can override the vlan.
> >
> > Pravin, can you take a look at this as well since you are working in
> > the same area?
>
> I think this compatibility code can be moved to new compat code posted
> on gre-restructuring code few changes.
Sure that sounds reasonable but I am unsure of the status of those changes
and I would rather not rely on something that isn't ready to merge yet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists