netdev - Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b66106e0-4bd6-e2ae-044d-e48c22546c87@kernel.org>
Date:   Sat, 5 Mar 2022 09:46:38 -0700
From:   David Ahern <dsahern@...nel.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>,
        Alexander Duyck <alexanderduyck@...com>
Subject: Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to
 jumbograms in ip6_output

On 3/4/22 10:47 AM, Eric Dumazet wrote:
> On Thu, Mar 3, 2022 at 8:33 PM David Ahern <dsahern@...nel.org> wrote:
>>
>> On 3/3/22 11:16 AM, Eric Dumazet wrote:
>>> From: Coco Li <lixiaoyan@...gle.com>
>>>
>>> Instead of simply forcing a 0 payload_len in IPv6 header,
>>> implement RFC 2675 and insert a custom extension header.
>>>
>>> Note that only TCP stack is currently potentially generating
>>> jumbograms, and that this extension header is purely local,
>>> it wont be sent on a physical link.
>>>
>>> This is needed so that packet capture (tcpdump and friends)
>>> can properly dissect these large packets.
>>>
>>
>>
>> I am fairly certain I know how you are going to respond, but I will ask
>> this anyways :-) :
>>
>> The networking stack as it stands today does not care that skb->len >
>> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
>> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
>> the larger packet size it just works.
> 
> Observability is key. "just works" is a bold claim.
> 
>>
>> The jumbogram header is getting adding at the L3/IPv6 layer and then
>> removed by the drivers before pushing to hardware. So, the only benefit
>> of the push and pop of the jumbogram header is for packet sockets and
>> tc/ebpf programs - assuming those programs understand the header
>> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
>> it is a standard header so apps have a chance to understand the larger
>> packet size, but what is the likelihood that random apps or even ebpf
>> programs will understand it?
> 
> Can you explain to me what you are referring to by " random apps" exactly ?
> TCP does not expose to user space any individual packet length.

TCP apps are not affected; they do not have direct access to L3 headers.
This is about packet sockets and ebpf programs and their knowledge of
the HBH header. This does not seem like a widely used feature and even
tcpdump only recently gained support for it (e.g.,  Ubuntu 20.04 does
not support it, 21.10 does). Given that what are the odds most packet
programs are affected by the change and if they need to have support we
could just as easily add that support in a way that gets both networking
layers working.

> 
> 
> 
>>
>> Alternative solutions to the packet socket (ebpf programs have access to
>> skb->len) problem would allow IPv4 to join the Big TCP party. I am
>> wondering how feasible an alternative solution is to get large packet
>> sizes across the board with less overhead and changes.
> 
> You know, I think I already answered this question 6 months ago.
> 
> We need to carry an extra metadata to carry how much TCP payload is in a packet,
> both on RX and TX side.
> 
> Adding an skb field for that was not an option for me.

Why? skb->len is not limited to a u16. The only affect is when skb->len
is used to fill in the ipv4/ipv6 header.

> 
> Adding a 8 bytes header is basically free, the headers need to be in cpu caches
> when the header is added/removed.
> 
> This is zero cost on current cpus, compared to the gains.
> 
> I think you focus on TSO side, which is only 25% of the possible gains
> that BIG TCP was seeking for.
> 
> We covered both RX and TX with a common mechanism.