netdev - Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEP_g=9C+D3gbjJ4n1t6xuyjqEAMYi4ZfqPoe92UAoQJH-UsKg@mail.gmail.com>
Date:	Wed, 3 Dec 2014 10:07:42 -0800
From:	Jesse Gross <jesse@...ira.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	Thomas Graf <tgraf@...g.ch>, "Du, Fan" <fan.du@...el.com>,
	Jason Wang <jasowang@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"fw@...len.de" <fw@...len.de>,
	"dev@...nvswitch.org" <dev@...nvswitch.org>,
	Pravin Shelar <pshelar@...ira.com>
Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU

On Wed, Dec 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@...hat.com> wrote:
> On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
>> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@...g.ch> wrote:
>> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
>> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
>> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
>> >> > > What about containers or any other virtualization environment that
>> >> > > doesn't use Virtio?
>> >> >
>> >> > The host can dictate the MTU in that case for both veth or OVS
>> >> > internal which would be primary container plumbing techniques.
>> >>
>> >> It typically can't do this easily for VMs with emulated devices:
>> >> real ethernet uses a fixed MTU.
>> >>
>> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
>> >> an unrelated optimization.
>> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
>> >
>> > PMTU discovery only resolves the issue if an actual IP stack is
>> > running inside the VM. This may not be the case at all.
>>
>> It's also only really a correct thing to do if the ICMP packet is
>> coming from an L3 node. If you are doing straight bridging then you
>> have to resort to hacks like OVS had before, which I agree are not
>> particularly desirable.
>
> The issue seems to be that fundamentally, this is
> bridging interfaces with variable MTUs (even if MTU values
> on devices don't let us figure this out)-
> that is already not straight bridging, and
> I would argue sending ICMPs back is the right thing to do.

How do you deal with the fact that there is no host IP stack inside
the tunnel? And isn't this exactly the same as the former OVS
implementation that you said you didn't like?

>> > I agree that exposing an MTU towards the guest is not applicable
>> > in all situations, in particular because it is difficult to decide
>> > what MTU to expose. It is a relatively elegant solution in a lot
>> > of virtualization host cases hooked up to an orchestration system
>> > though.
>>
>> I also think this is the right thing to do as a common case
>> optimization and I know other platforms (such as Hyper-V) do it. It's
>> not a complete solution so we still need the original patch in this
>> thread to handle things transparently.
>
> Well, as I believe David (and independently Jason) is saying, it looks like
> the ICMPs we are sending back after applying the original patch have the
> wrong MTU.

The problem is actually that the ICMP messages won't even go to the
sending VM because the host IP stack and the VM are isolated from each
other and there is no route.

> And if I understand what David is saying here, IP is also the wrong place to
> do it.

ICMP can't be the complete solution in any case because it only works
for IP traffic. I think there are only two full solutions: find a way
to adjust the guest MTU to the minimum MTU that its traffic could hit
in an L2 domain or fragmentation. ICMP could be a possible
optimization in the fragmentation case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html