[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMEtUuykKUKcwDcaOC7bHcHkU42UMjDkoio=ehzUNiMew30mhw@mail.gmail.com>
Date: Mon, 18 Nov 2013 19:51:10 -0800
From: Alexei Starovoitov <ast@...mgrid.com>
To: David Stevens <dlstevens@...ibm.com>
Cc: David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
John Fastabend <john.r.fastabend@...el.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
netdev-owner@...r.kernel.org, Or Gerlitz <or.gerlitz@...il.com>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net-next] veth: extend features to support tunneling
On Mon, Nov 18, 2013 at 9:55 AM, David Stevens <dlstevens@...ibm.com> wrote:
> 1) ICMP2BIG reflected from the tunnel endpoint without host routing and
> using the destination IP as the forged source address, thus
> appropriate
> for bridge-only hosting.
It doesn't look that existing icmp_send() can be hacked this way.
It cannot do route lookup and cannot do neigh lookup.
IPs and macs are valid within VM and known to virtualized networking
components, but
tunnel cannot possibly know what is standing between VM and tunnel.
VM may be sending over tap into a bridge that forwards into netns that
is running ip_fowarding
between two vethes, then goes into 2nd bridge and only then sent into
vxlan device.
ovs can do many actions before skb from tap is delivered to vport-vxlan.
The generic thing vxlan driver can do is to take mac and ip headers
from inner skb,
swap macs, swap ips, add icmphdr with code=frag_needed and pretend
that such packet
was received and decapsulated by vxlan, and call into vxlan_sock->rcv()
It will go back into ovs or bridge and will proceed through the
reverse network topology path all the way back into VM that can adjust
its mtu accordingly.
> 2) Allowing larger-than-gso_size segmentation as well as smaller when the
> final destination is on the virtual L2 network.
I think it should be ok for virtual L3 as well. From VM point of view,
it's sending 1500 byte packets and virtual routers/bridges on the path
to destination should check packets against their own mtu values. But
if tunneling infra is smart enough to use large frames between two
hypervisors,
it should do so. DF bit and pmtu logic applies within virtual
network, so sending icmp_frag_needed back into VM is exposing physical
infrastructure to virtual network.
True virtual distributed bridge with VMs should allow setting 8k mtu
inside VM and on the bridge and still function with 1500 mtu in the
underlying physical network.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists