[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea73ca6cb4569847d5f2b2a3a5e1f88d78ba1c1a.camel@gmail.com>
Date: Fri, 04 Mar 2022 11:00:49 -0800
From: Alexander H Duyck <alexander.duyck@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: David Ahern <dsahern@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>,
Alexander Duyck <alexanderduyck@...com>
Subject: Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to
jumbograms in ip6_output
On Fri, 2022-03-04 at 09:09 -0800, Eric Dumazet wrote:
> On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
> <alexander.duyck@...il.com> wrote:
> >
> > On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > > From: Coco Li <lixiaoyan@...gle.com>
> > > >
> > > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > > implement RFC 2675 and insert a custom extension header.
> > > >
> > > > Note that only TCP stack is currently potentially generating
> > > > jumbograms, and that this extension header is purely local,
> > > > it wont be sent on a physical link.
> > > >
> > > > This is needed so that packet capture (tcpdump and friends)
> > > > can properly dissect these large packets.
> > > >
> > >
> > >
> > > I am fairly certain I know how you are going to respond, but I will ask
> > > this anyways :-) :
> > >
> > > The networking stack as it stands today does not care that skb->len >
> > > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > > the larger packet size it just works.
> > >
> > > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > > removed by the drivers before pushing to hardware. So, the only benefit
> > > of the push and pop of the jumbogram header is for packet sockets and
> > > tc/ebpf programs - assuming those programs understand the header
> > > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > > it is a standard header so apps have a chance to understand the larger
> > > packet size, but what is the likelihood that random apps or even ebpf
> > > programs will understand it?
> > >
> > > Alternative solutions to the packet socket (ebpf programs have access to
> > > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > > wondering how feasible an alternative solution is to get large packet
> > > sizes across the board with less overhead and changes.
> >
> > I agree that the header insertion and removal seems like a lot of extra
> > overhead for the sake of correctness. In the Microsoft case I am pretty
> > sure their LSOv2 supported both v4 and v6. I think we could do
> > something similar, we would just need to make certain the device
> > supports it and as such maybe it would make sense to implement it as a
> > gso type flag?
> >
> > Could we handle the length field like we handle the checksum and place
> > a value in there that we know is wrong, but could be used to provide
> > additional data? Perhaps we could even use it to store the MSS in the
> > form of the length of the first packet so if examined, the packet would
> > look like the first frame of the flow with a set of trailing data.
> >
>
> I am a bit sad you did not give all this feedback back in August when
> I presented BIG TCP.
>
As I recall, I was thinking along the same lines as what you have done
here, but Dave's question about including IPv4 does bring up an
interesting point. And the Microsoft version supported both.
> We did a lot of work in the last 6 months to implement, test all this,
> making sure this worked.
>
> I am not sure I want to spend another 6 months implementing what you suggest.
I am not saying we have to do this. I am simply stating a "what if"
just to gauge this approach. You could think of it as thinking out
loud, but in written form.
> For instance, input path will not like packets larger than 64KB.
>
> There is this thing trimming padding bytes, you probably do not want
> to mess with this.
I had overlooked the fact that this is being used on the input path,
the trimming would be an issue. I suppose the fact that the LSOv2
didn't have an Rx counterpart would be one reason for us to not
consider the IPv4 approach.
Powered by blists - more mailing lists