netdev - Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b73e2dd1-d7bc-e96b-8553-1536a1146f3c@gmail.com>
Date:   Mon, 16 Jan 2023 08:07:58 -0700
From:   David Ahern <dsahern@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>, Xin Long <lucien.xin@...il.com>
Cc:     network dev <netdev@...r.kernel.org>, davem@...emloft.net,
        kuba@...nel.org, Paolo Abeni <pabeni@...hat.com>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Pravin B Shelar <pshelar@....org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Florian Westphal <fw@...len.de>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Ilya Maximets <i.maximets@....org>,
        Aaron Conole <aconole@...hat.com>,
        Roopa Prabhu <roopa@...dia.com>,
        Nikolay Aleksandrov <razor@...ckwall.org>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Paul Moore <paul@...l-moore.com>,
        Guillaume Nault <gnault@...hat.com>
Subject: Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in
 length_mt6

On 1/16/23 2:24 AM, Eric Dumazet wrote:
> On Sun, Jan 15, 2023 at 9:15 PM Xin Long <lucien.xin@...il.com> wrote:
>>
>> On Sun, Jan 15, 2023 at 2:40 PM Eric Dumazet <edumazet@...gle.com> wrote:
>>>
>>> On Sun, Jan 15, 2023 at 6:43 PM Xin Long <lucien.xin@...il.com> wrote:
>>>>
>>>> On Sun, Jan 15, 2023 at 10:41 AM David Ahern <dsahern@...il.com> wrote:
>>>>>
>>>>> On 1/13/23 8:31 PM, Xin Long wrote:
>>>>>> For IPv6 jumbogram packets, the packet size is bigger than 65535,
>>>>>> it's not right to get it from payload_len and save it to an u16
>>>>>> variable.
>>>>>>
>>>>>> This patch only fixes it for IPv6 BIG TCP packets, so instead of
>>>>>> parsing IPV6_TLV_JUMBO exthdr, which is quite some work, it only
>>>>>> gets the pktlen via 'skb->len - skb_network_offset(skb)' when
>>>>>> skb_is_gso_v6() and saves it to an u32 variable, similar to IPv4
>>>>>> BIG TCP packets.
>>>>>>
>>>>>> This fix will also help us add selftest for IPv6 BIG TCP in the
>>>>>> following patch.
>>>>>>
>>>>>
>>>>> If this is a bug fix for the existing IPv6 support, send it outside of
>>>>> this set for -net.
>>>>>
>>>> Sure,
>>>> I was thinking of adding it here to be able to support selftest for
>>>> IPv6 too in the next patch. But it seems to make more sense to
>>>> get it into -net first, then add this selftest after it goes to net-next.
>>>>
>>>> I will post it and all other fixes I mentioned in the cover-letter for
>>>> IPv6 BIG TCP for -net.
>>>>
>>>> But before that, I hope Eric can confirm it is okay to read the length
>>>> of IPv6 BIG TCP packets with skb_ipv6_totlen() defined in this patch,
>>>> instead of parsing JUMBO exthdr?
>>>>
>>>
>>> I do not think it is ok, but I will leave the question to netfilter maintainers.
>> Just note that the issue doesn't only exist in netfilter.
>> All the changes in Patch 2-7 from this patchset are also needed for IPv6
>> BIG TCP packets.
>>
>>>
>>> Guessing things in tcpdump or other tools is up to user space implementations,
>>> trying to work around some (kernel ?) deficiencies.
>>>
>>> Yes, IPv6 extensions headers are a pain, we all agree.
>>>
>>> Look at how ip6_rcv_core() properly dissects extension headers _and_ trim
>>> skb accordingly (pskb_trim_rcsum() called either from ip6_rcv_core()
>>> or ipv6_hop_jumbo())
>>>
>>> So skb->len is not the root of trust. Some transport mediums might add paddings.
>>>
>>> Ipv4 has a similar logic in ip_rcv_core().
>>>
>>> len = ntohs(iph->tot_len);
>>> if (skb->len < len) {
>>>      drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
>>>      __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS);
>>>     goto drop;
>>> } else if (len < (iph->ihl*4))
>>>      goto inhdr_error;
>>>
>>> /* Our transport medium may have padded the buffer out. Now we know it
>>> * is IP we can trim to the true length of the frame.
>>> * Note this now means skb->len holds ntohs(iph->tot_len).
>>> */
>>> if (pskb_trim_rcsum(skb, len)) {
>>>       __IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
>>>       goto drop;
>>> }
>>>
>>> After your changes, we might accept illegal packets that were properly
>>> dropped before.
>> I think skb->len is trustable for GSO/GRO packets.
>> In ipv6_gro_complete/inet_gro_complete():
>> The new length for payload_len or iph->tot_len are all calculated from skb->len.
>> As I said in the cover-letter, "there is no padding in GSO/GRO packets".
>> Or am I missing something?
> 
> This seems to be a contract violation with user space providing GSO packets.
> 
> In our changes we added some sanity checks, inherent to JUMBO specs.
> 
> Here, a GSO packet can now have a zero ip length, no matter if it is
> BIG TCP or not.

Meaning your preference is to set tot_len anytime it is <= 64kB so the
only time tot_len == 0 is for large GRO/TSO packets? That is doable.

> 
> It seems we lower the bar for consistency, and allow bugs (say
> changing skb->len) to not be detected.

not sure why you think it would not be detected. Today's model for gro
sets tot_len based on skb->len. There is an inherent trust that the
user's of the gro API set the length correctly. If it is not, the
payload to userspace would ultimately be non-sense and hence detectable.
I tend to use ssh to test changes like this for this reason - L4 payload
must make sense.

For the Tx path, there is a similar line of trust that the skb->len
passed to the L3 layer is correct. IPv4/IPv6 blindly trust what it is
told for length.


> 
> As you said, user space sniffing packets now have to guess what is the
> intent, instead of headers carrying all the needed information
> that can be fully validated by parsers.

This is a solveable problem within the packet socket API, and the entire
thing is opt-in. If a user's tcpdump / packet capture program is out of
date and does not support the new API for large packets, then that user
does not have to enable large GRO/TSO.