netdev - Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141203183859.GB16447@redhat.com>
Date:	Wed, 3 Dec 2014 20:38:59 +0200
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Jesse Gross <jesse@...ira.com>
Cc:	Thomas Graf <tgraf@...g.ch>, "Du, Fan" <fan.du@...el.com>,
	Jason Wang <jasowang@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"fw@...len.de" <fw@...len.de>,
	"dev@...nvswitch.org" <dev@...nvswitch.org>,
	Pravin Shelar <pshelar@...ira.com>
Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU

On Wed, Dec 03, 2014 at 10:07:42AM -0800, Jesse Gross wrote:
> On Wed, Dec 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@...hat.com> wrote:
> > On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
> >> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@...g.ch> wrote:
> >> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
> >> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
> >> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
> >> >> > > What about containers or any other virtualization environment that
> >> >> > > doesn't use Virtio?
> >> >> >
> >> >> > The host can dictate the MTU in that case for both veth or OVS
> >> >> > internal which would be primary container plumbing techniques.
> >> >>
> >> >> It typically can't do this easily for VMs with emulated devices:
> >> >> real ethernet uses a fixed MTU.
> >> >>
> >> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
> >> >> an unrelated optimization.
> >> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
> >> >
> >> > PMTU discovery only resolves the issue if an actual IP stack is
> >> > running inside the VM. This may not be the case at all.
> >>
> >> It's also only really a correct thing to do if the ICMP packet is
> >> coming from an L3 node. If you are doing straight bridging then you
> >> have to resort to hacks like OVS had before, which I agree are not
> >> particularly desirable.
> >
> > The issue seems to be that fundamentally, this is
> > bridging interfaces with variable MTUs (even if MTU values
> > on devices don't let us figure this out)-
> > that is already not straight bridging, and
> > I would argue sending ICMPs back is the right thing to do.
> 
> How do you deal with the fact that there is no host IP stack inside
> the tunnel? And isn't this exactly the same as the former OVS
> implementation that you said you didn't like?

I was talking about the high level requirement, not the implementation
here. I agree it's not at all trivial, we need to propagate this across
tunnels.

But let's agree on what we are trying to accomplish first.


> >> > I agree that exposing an MTU towards the guest is not applicable
> >> > in all situations, in particular because it is difficult to decide
> >> > what MTU to expose. It is a relatively elegant solution in a lot
> >> > of virtualization host cases hooked up to an orchestration system
> >> > though.
> >>
> >> I also think this is the right thing to do as a common case
> >> optimization and I know other platforms (such as Hyper-V) do it. It's
> >> not a complete solution so we still need the original patch in this
> >> thread to handle things transparently.
> >
> > Well, as I believe David (and independently Jason) is saying, it looks like
> > the ICMPs we are sending back after applying the original patch have the
> > wrong MTU.
> 
> The problem is actually that the ICMP messages won't even go to the
> sending VM because the host IP stack and the VM are isolated from each
> other and there is no route.

Exactly.
But all this is talking about implementation.

Let's agree on what we want to do first.

And in my mind, we typically want originator to adjust its PMTU,
just for a given destination.
Sending ICMP to originating VM will typically accomplish this.




> > And if I understand what David is saying here, IP is also the wrong place to
> > do it.
> 
> ICMP can't be the complete solution in any case because it only works
> for IP traffic.

Let's be specific please.  What protocols do you most care about? IPX?

> I think there are only two full solutions: find a way
> to adjust the guest MTU to the minimum MTU that its traffic could hit
> in an L2 domain or fragmentation. ICMP could be a possible
> optimization in the fragmentation case.

Both approaches seem strange. You are sending 1 packet an hour to
some destination behind 100 tunnels. Why would you want to
cut down your MTU for all packets? On the other hand,
doubling the amount of packets because your MTU is off
by a couple of bytes will hurt performance significantly.

Still, if you want to cut down the MTU within guest,
that's only an ifconfig away.
Most people would not want to bother, I think it's a good
idea to make PMTU work properly for them.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html