netdev - Re: [PATCH net-next] veth: extend features to support tunneling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <OFF76C3619.82D931F7-ON85257C27.005F4843-85257C27.006270D6@us.ibm.com>
Date:	Mon, 18 Nov 2013 12:55:15 -0500
From:	David Stevens <dlstevens@...ibm.com>
To:	Alexei Starovoitov <ast@...mgrid.com>
Cc:	David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	netdev-owner@...r.kernel.org, Or Gerlitz <or.gerlitz@...il.com>,
	Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net-next] veth: extend features to support tunneling

netdev-owner@...r.kernel.org wrote on 11/17/2013 02:31:08 AM:

> From: Alexei Starovoitov <ast@...mgrid.com>
 
> when host mtu doesn't account for overhead of tunnel, the neat trick
> we can do is to decrease gso_size while adding tunnel header.

Won't this possibly result in ip_id collision if you generate more 
segments
than the VM thinks you did? Not an issue of DF is set, I suppose.

> This way when skb_gso_segment() kicks in during tx the packets will be
> segmented into host mtu sized packets.

        I've been looking at something like this, but going the other way.
In the VXLAN case, other VXLAN VMs are reachable via the virtual L2 
network
so we *ought* to use large packets in the whole path if the underlay
network can do that. PMTUD should not apply within the virtual L2 network,
but the code as-is will segment to the VM gso_size (say 1500) even if the
underlay network can send jumbo frames (say 9K).
        I think what we want for best performance is to send GSO packets 
from
the VM and use underlay network MTU minus VXLAN headers as the gso_size 
for
those.
        One complication there is that we would actually want to use the 
VM
gso_size if the destination is a router taking us off of the VXLAN 
network,
but we can tell that from the fdb entry when route-short-circuiting is 
enabled.

> Receiving vm on the other side will be seeing packets of size
> guest_mtu - tunnel_header_size,
> but imo that's much better than sending ip fragments over vxlan fabric.
> It will work for guests sending tcp/udp, but there is no good solution
> for icmp other than ip frags.

My idea for solving this ICMP issue is to forge an ICMP2BIG with the
source address of the destination and send it back to the originating VM
directly. It's complicated a bit because we don't want to use icmp_send()
on the host, which will go through host routing tables when we don't
necessarily have an IP address or routing for the bridged domain. So, to
do this, we really should just have a stand-alone icmp sender for use by
tunnel endpoints. But if we do this, the VM can get the correct gso_size
accounting for the tunnel overhead too, although it's abusing PMTUD a bit
since it doesn't ordinarily apply to hosts on the same L2 network.

I haven't gotten working patches for either of these extensions yet--
if your work overlaps before I do, I'd be happy to see these two things
incorporated in it:

1) ICMP2BIG reflected from the tunnel endpoint without host routing and
        using the destination IP as the forged source address, thus 
appropriate
        for bridge-only hosting.
2) Allowing larger-than-gso_size segmentation as well as smaller when the
        final destination is on the virtual L2 network.

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html