[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120605081234.GB12720@verge.net.au>
Date: Tue, 5 Jun 2012 17:12:35 +0900
From: Simon Horman <horms@...ge.net.au>
To: Jesse Gross <jesse@...ira.com>
Cc: dev@...nvswitch.org, netdev@...r.kernel.org
Subject: Re: [ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id
with tun_key
On Tue, Jun 05, 2012 at 12:33:11PM +0900, Jesse Gross wrote:
> On Tue, Jun 5, 2012 at 7:34 AM, Simon Horman <horms@...ge.net.au> wrote:
> > On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
> >> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@...ge.net.au> wrote:
> >> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> >> > int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> >> > {
> >> > struct ovs_key_ethernet *eth_key;
> >> > + struct ovs_key_ipv4_tunnel *tun_key;
> >> > struct nlattr *nla, *encap;
> >> >
> >> > if (swkey->phy.priority &&
> >> > nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> >> > goto nla_put_failure;
> >> >
> >> > - if (swkey->phy.tun_id != cpu_to_be64(0) &&
> >> > - nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> >> > - goto nla_put_failure;
> >> > + if (swkey->phy.tun_key.ipv4_dst) {
> >>
> >> It's probably OK to use DIP equal to zero as a not present marker but
> >> we need to enforce that it's always true - for example we shouldn't
> >> allow somebody to setup a flow that way or receive packets with a zero
> >> address. Alternately, we may be able to find a spare bit to indicate
> >> this, like is done with vlans.
> >
> > When I originally wrote this there didn't seem to be any obvious
> > place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
> > is in part why the code relies on checking DIP.
> >
> > However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
> > which only one bit is currently used. We could use one of the unused flag
> > bits.
>
> I guess it depends on what we end up doing with the lookup struct. If
> it stays as it is today, there's plenty of space if you include those
> padding bytes. If we shrink it down and there isn't a place then I do
> think it is fine to use DIP (since this is traversing an IP stack and
> DIP = 0 is an invalid value it's not like an L2 switch not allowing
> invalid IP packet). In that case, we just need to do more validation
> in other places to make sure that this is the only situation that the
> condition can arise.
Ok, my suspicion is that there will be space.
> >> In any case, I think we need to do some additional validation when
> >> setting up flows to check reserved space, for example, as otherwise
> >> that will never match.
> >
> > Sure. My testing seems to indicate that matching does occur,
> > though I am quite happy to tighten things up.
>
> I don't think it causes a problem as long as userspace is well
> behaved, I was think it's best to detect problems early.
Ok, good point. I'll make sure the data is clean.
>
> >> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
> >> that I think can never apply for lookup such as the flags so it would
> >> be nice if we could remove that for lookup.
> >
> > I think they need to be there to be passed around, so I'm not
> > sure if they can easily be removed from ovs_key_ipv4_tunnel if that
> > is what you are asking.
>
> My guess is that we'll probably want to separate out the struct that
> is used for lookup from the one that is used for communication with
> userspace, which is what we do for most things so that we have more
> freedom to optimize in the kernel.
Understood, I'll look into doing that.
> >> > bool ovs_tnl_frag_needed(struct vport *vport,
> >> > const struct tnl_mutable_config *mutable,
> >> > - struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> >> > + struct sk_buff *skb, unsigned int mtu,
> >> > + struct ovs_key_ipv4_tunnel *tun_key)
> >> > {
> >> > unsigned int eth_hdr_len = ETH_HLEN;
> >> > unsigned int total_length = 0, header_length = 0, payload_length;
> >> > struct ethhdr *eh, *old_eh = eth_hdr(skb);
> >> > struct sk_buff *nskb;
> >> > + struct ovs_key_ipv4_tunnel ntun_key;
> >> >
> >> > /* Sanity check */
> >> > if (skb->protocol == htons(ETH_P_IP)) {
> >> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >> > * any way of synthesizing packets.
> >> > */
> >> > if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> >> > - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> >> > - OVS_CB(nskb)->tun_id = flow_key;
> >> > + (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> >> > + ntun_key = *tun_key;
> >> > + OVS_CB(nskb)->tun_key = &ntun_key;
> >> > + }
> >>
> >> I guess this is probably where you were going to use the function to
> >> reverse IP addresses. The logic doesn't really work but it's moot
> >> since this is going away anyways.
> >
> > My latest series includes a clean up to ovs_tnl_frag_needed() to allow
> > it to work in some circumstances - i.e. those found in my test environment.
> > That series removes knowledge of tun_key from ovs_tnl_frag_needed().
> >
> > I am however happy to remove ovs_tnl_frag_needed() completely if you think
> > that is appropriate.
>
> I think, in retrospect, that the path MTU discovery that I implemented
> here was probably not the right choice and MSS clamping is the correct
> way to do things. It was better when it wasn't possible to do any
> kind of flow-based manipulation of tunnels but the model is breaking
> down more and more over time. Given that I would be hesitant to
> submit it upstream and since that's the goal of this work, removing it
> completely is probably the right thing to do.
Sure. My latest series also includes an implementation of MSS clamping,
so assuming it isn't doomed in some way we should be safe to remove
MTU discovery. I'll see about doing that for my next post.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists