[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+iGb7p2SqLXvzmkR3W3T_BgUWW78-4z0TxrBW8dYTnwA@mail.gmail.com>
Date: Sat, 5 Mar 2022 10:08:40 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: David Ahern <dsahern@...nel.org>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>,
Alexander Duyck <alexanderduyck@...com>
Subject: Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to
jumbograms in ip6_output
On Sat, Mar 5, 2022 at 8:46 AM David Ahern <dsahern@...nel.org> wrote:
>
> On 3/4/22 10:47 AM, Eric Dumazet wrote:
> > On Thu, Mar 3, 2022 at 8:33 PM David Ahern <dsahern@...nel.org> wrote:
> >>
> >> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> >>> From: Coco Li <lixiaoyan@...gle.com>
> >>>
> >>> Instead of simply forcing a 0 payload_len in IPv6 header,
> >>> implement RFC 2675 and insert a custom extension header.
> >>>
> >>> Note that only TCP stack is currently potentially generating
> >>> jumbograms, and that this extension header is purely local,
> >>> it wont be sent on a physical link.
> >>>
> >>> This is needed so that packet capture (tcpdump and friends)
> >>> can properly dissect these large packets.
> >>>
> >>
> >>
> >> I am fairly certain I know how you are going to respond, but I will ask
> >> this anyways :-) :
> >>
> >> The networking stack as it stands today does not care that skb->len >
> >> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> >> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> >> the larger packet size it just works.
> >
> > Observability is key. "just works" is a bold claim.
> >
> >>
> >> The jumbogram header is getting adding at the L3/IPv6 layer and then
> >> removed by the drivers before pushing to hardware. So, the only benefit
> >> of the push and pop of the jumbogram header is for packet sockets and
> >> tc/ebpf programs - assuming those programs understand the header
> >> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> >> it is a standard header so apps have a chance to understand the larger
> >> packet size, but what is the likelihood that random apps or even ebpf
> >> programs will understand it?
> >
> > Can you explain to me what you are referring to by " random apps" exactly ?
> > TCP does not expose to user space any individual packet length.
>
> TCP apps are not affected; they do not have direct access to L3 headers.
> This is about packet sockets and ebpf programs and their knowledge of
> the HBH header. This does not seem like a widely used feature and even
> tcpdump only recently gained support for it (e.g., Ubuntu 20.04 does
> not support it, 21.10 does). Given that what are the odds most packet
> programs are affected by the change and if they need to have support we
> could just as easily add that support in a way that gets both networking
> layers working.
>
> >
> >
> >
> >>
> >> Alternative solutions to the packet socket (ebpf programs have access to
> >> skb->len) problem would allow IPv4 to join the Big TCP party. I am
> >> wondering how feasible an alternative solution is to get large packet
> >> sizes across the board with less overhead and changes.
> >
> > You know, I think I already answered this question 6 months ago.
> >
> > We need to carry an extra metadata to carry how much TCP payload is in a packet,
> > both on RX and TX side.
> >
> > Adding an skb field for that was not an option for me.
>
> Why? skb->len is not limited to a u16. The only affect is when skb->len
> is used to fill in the ipv4/ipv6 header.
Seriously ?
Have you looked recently at core networking stack, and have you read
my netdev presentation ?
Core networking stack will trim your packets skb->len based on what is
found in the network header,
which is a 16bit field, unless you use HBH.
Look at ip6_rcv_core().
Do you want to modify it ?
Let us know how exactly, and why it is not going to break things.
pkt_len = ntohs(hdr->payload_len);
/* pkt_len may be zero if Jumbo payload option is present */
if (pkt_len || hdr->nexthdr != NEXTHDR_HOP) {
if (pkt_len + sizeof(struct ipv6hdr) > skb->len) {
__IP6_INC_STATS(net,
idev, IPSTATS_MIB_INTRUNCATEDPKTS);
goto drop;
}
if (pskb_trim_rcsum(skb, pkt_len + sizeof(struct ipv6hdr))) {
__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
goto drop;
}
hdr = ipv6_hdr(skb);
}
if (hdr->nexthdr == NEXTHDR_HOP) {
if (ipv6_parse_hopopts(skb) < 0) {
__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
rcu_read_unlock();
return NULL;
}
}
>
> >
> > Adding a 8 bytes header is basically free, the headers need to be in cpu caches
> > when the header is added/removed.
> >
> > This is zero cost on current cpus, compared to the gains.
> >
> > I think you focus on TSO side, which is only 25% of the possible gains
> > that BIG TCP was seeking for.
> >
> > We covered both RX and TX with a common mechanism.
>
Powered by blists - more mailing lists