[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iK-treGphHqA-052DMSuuL_-ubdnhBUcpptqT_gnJyovw@mail.gmail.com>
Date: Fri, 4 Mar 2022 11:13:08 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Alexander H Duyck <alexander.duyck@...il.com>
Cc: David Ahern <dsahern@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>,
Alexander Duyck <alexanderduyck@...com>
Subject: Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to
jumbograms in ip6_output
On Fri, Mar 4, 2022 at 11:00 AM Alexander H Duyck
<alexander.duyck@...il.com> wrote:
>
> On Fri, 2022-03-04 at 09:09 -0800, Eric Dumazet wrote:
> > On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
> > <alexander.duyck@...il.com> wrote:
> > >
> > > On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > > > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > > > From: Coco Li <lixiaoyan@...gle.com>
> > > > >
> > > > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > > > implement RFC 2675 and insert a custom extension header.
> > > > >
> > > > > Note that only TCP stack is currently potentially generating
> > > > > jumbograms, and that this extension header is purely local,
> > > > > it wont be sent on a physical link.
> > > > >
> > > > > This is needed so that packet capture (tcpdump and friends)
> > > > > can properly dissect these large packets.
> > > > >
> > > >
> > > >
> > > > I am fairly certain I know how you are going to respond, but I will ask
> > > > this anyways :-) :
> > > >
> > > > The networking stack as it stands today does not care that skb->len >
> > > > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > > > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > > > the larger packet size it just works.
> > > >
> > > > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > > > removed by the drivers before pushing to hardware. So, the only benefit
> > > > of the push and pop of the jumbogram header is for packet sockets and
> > > > tc/ebpf programs - assuming those programs understand the header
> > > > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > > > it is a standard header so apps have a chance to understand the larger
> > > > packet size, but what is the likelihood that random apps or even ebpf
> > > > programs will understand it?
> > > >
> > > > Alternative solutions to the packet socket (ebpf programs have access to
> > > > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > > > wondering how feasible an alternative solution is to get large packet
> > > > sizes across the board with less overhead and changes.
> > >
> > > I agree that the header insertion and removal seems like a lot of extra
> > > overhead for the sake of correctness. In the Microsoft case I am pretty
> > > sure their LSOv2 supported both v4 and v6. I think we could do
> > > something similar, we would just need to make certain the device
> > > supports it and as such maybe it would make sense to implement it as a
> > > gso type flag?
> > >
> > > Could we handle the length field like we handle the checksum and place
> > > a value in there that we know is wrong, but could be used to provide
> > > additional data? Perhaps we could even use it to store the MSS in the
> > > form of the length of the first packet so if examined, the packet would
> > > look like the first frame of the flow with a set of trailing data.
> > >
> >
> > I am a bit sad you did not give all this feedback back in August when
> > I presented BIG TCP.
> >
>
> As I recall, I was thinking along the same lines as what you have done
> here, but Dave's question about including IPv4 does bring up an
> interesting point. And the Microsoft version supported both.
Yes, maybe they added metadata for that, and decided to let packet capture
in the dark, or changed tcpdump/wireshark to fetch/use this metadata ?
This was the first thing I tried one year ago, and eventually gave up,
because this was a no go for us.
Then seeing HBH Jumbo support being added recently in tcpdump,
I understood we could finally get visibility, and started BIG TCP using this.
I guess someone might add extra logic to allow ipv4 BIG TCP, if they
really need it,
I will not object to it.
>
> > We did a lot of work in the last 6 months to implement, test all this,
> > making sure this worked.
> >
> > I am not sure I want to spend another 6 months implementing what you suggest.
>
> I am not saying we have to do this. I am simply stating a "what if"
> just to gauge this approach. You could think of it as thinking out
> loud, but in written form.
Understood.
BTW I spent time adding a new gso_type flag, but also gave up because we have
no more room in features_t type.
Solving features_t exhaustion alone is a delicate topic.
>
> > For instance, input path will not like packets larger than 64KB.
> >
> > There is this thing trimming padding bytes, you probably do not want
> > to mess with this.
>
> I had overlooked the fact that this is being used on the input path,
> the trimming would be an issue. I suppose the fact that the LSOv2
> didn't have an Rx counterpart would be one reason for us to not
> consider the IPv4 approach.
>
Powered by blists - more mailing lists