netdev - Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJgMRDV_8jT6GSusJxgraLvXo-NCA=A-qfA7p3qZ8Os5Q@mail.gmail.com>
Date:   Thu, 19 Jan 2023 20:17:30 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Xin Long <lucien.xin@...il.com>
Cc:     David Ahern <dsahern@...il.com>,
        network dev <netdev@...r.kernel.org>, davem@...emloft.net,
        kuba@...nel.org, Paolo Abeni <pabeni@...hat.com>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Pravin B Shelar <pshelar@....org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Florian Westphal <fw@...len.de>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Ilya Maximets <i.maximets@....org>,
        Aaron Conole <aconole@...hat.com>,
        Roopa Prabhu <roopa@...dia.com>,
        Nikolay Aleksandrov <razor@...ckwall.org>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Paul Moore <paul@...l-moore.com>,
        Guillaume Nault <gnault@...hat.com>
Subject: Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length_mt6

On Thu, Jan 19, 2023 at 7:59 PM Xin Long <lucien.xin@...il.com> wrote:
>
> On Thu, Jan 19, 2023 at 1:10 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Thu, Jan 19, 2023 at 5:51 PM Xin Long <lucien.xin@...il.com> wrote:
> > >
> > > On Thu, Jan 19, 2023 at 10:41 AM David Ahern <dsahern@...il.com> wrote:
> > > >
> > > > On 1/18/23 8:13 PM, Eric Dumazet wrote:
> > > > > On Thu, Jan 19, 2023 at 2:19 AM Xin Long <lucien.xin@...il.com> wrote:
> > > > >
> > > > >> I think that IPv6 BIG TCP has a similar problem, below is the tcpdump in
> > > > >> my env (RHEL-8), and it breaks too:
> > > > >>
> > > > >> 19:43:59.964272 IP6 2001:db8:1::1 > 2001:db8:2::1: [|HBH]
> > > > >> 19:43:59.964282 IP6 2001:db8:1::1 > 2001:db8:2::1: [|HBH]
> > > > >> 19:43:59.964292 IP6 2001:db8:1::1 > 2001:db8:2::1: [|HBH]
> > > > >> 19:43:59.964300 IP6 2001:db8:1::1 > 2001:db8:2::1: [|HBH]
> > > > >> 19:43:59.964308 IP6 2001:db8:1::1 > 2001:db8:2::1: [|HBH]
> > > > >>
> > > > >
> > > > > Please make sure to use a not too old tcpdump.
> > > > >
> > > > >> it doesn't show what we want from the TCP header either.
> > > > >>
> > > > >> For the latest tcpdump on upstream, it can display headers well for
> > > > >> IPv6 BIG TCP. But we can't expect all systems to use the tcpdump
> > > > >> that supports HBH parsing.
> > > > >
> > > > > User error. If an admin wants to diagnose TCP potential issues, it should use
> > > > > a correct version.
> > > >
> > > > Both of those just fall under "if you want a new feature, update your
> > > > tools."
> > > >
> > > >
> > > > >
> > > > >>
> > > > >> For IPv4 BIG TCP, it's just a CFLAGS change to support it in "tcpdump,"
> > > > >> and 'tshark' even supports it by default.
> > > > >
> > > > > Not with privacy _requirements_, where only the headers are captured.
> > > > >
> > > > > I am keeping a NACK, until you make sure you do not break this
> > > > > important feature.
> > > >
> > > > I think the request here is to keep the snaplen in place (e.g., to make
> > > > only headers visible to userspace) while also returning the >64kB packet
> > > > length as meta data.
> > > >
> > > > My last pass on the packet socket code suggests this is possible;
> > > > someone (Xin) needs to work through the details.
> > > >
> > > To be honest, I don't really like such a change in a packet socket,
> > > I tried, and the code doesn't look nice.
> > >
> > > I'm thinking since skb->len is trustable, why don't we use
> > > IP_MAX_MTU(0xFFFF) as iph->tot_len for IPv4 BIG TCP?
> > > namely, only change these 2 helpers to:
> > >
> > > static inline unsigned int iph_totlen(const struct sk_buff *skb, const
> > > struct iphdr *iph)
> > > {
> > >         u16 len = ntohs(iph->tot_len);
> > >
> > >         return (len < IP_MAX_MTU || !skb_is_gso_tcp(skb)) ? len :
> > >                 skb->len - skb_network_offset(skb);
> > > }
> > >
> > > static inline void iph_set_totlen(struct iphdr *iph, unsigned int len)
> > > {
> > >         iph->tot_len = len < IP_MAX_MTU ? htons(len) : htons(IP_MAX_MTU);
> > > }
> > >
> > > What do you think?
> >
> > I think this is a no go for me.
> >
> > I think I stated clearly what was the problem.
> > If you care about TCP diagnostics, you want the truth, not truncated
> > sequence ranges,
> > making it impossible to know if a packet was sent.
> Sorry Eric if I didn't get you well.
>
> With new helpers, the iph->tot_len will be set to IP_MAX_MTU(65535),
> all TCP headers will display well, no truncated sequence ranges:
>
> #  ip net exec router tcpdump -i link1
> 13:36:46.675522 IP 198.51.100.1.42289 > 203.0.113.1.45103: Flags [P.],
> seq 1532642515:1532707998, ack 1, win 504, options [nop,nop,TS val
> 2975547125 ecr 2379476018], length 65483
> 13:36:46.675534 IP 198.51.100.1.42289 > 203.0.113.1.45103: Flags [P.],
> seq 1532769005:1532834488, ack 1, win 504, options [nop,nop,TS val
> 2975547125 ecr 2379476018], length 65483

This is completely truncated, don't you see this ?

According to tcpdump, we sent sequences 1532642515:1532707998 and
1532769005:1532834488

And payload was of  65483 bytes per packet (this is not true)

What happened for 1532707998 -> 1532769005 ???

How network engineers will know "oh wait, data was sent/received after all",
and not dropped somewhere in the network or in netfilter or ... in a kernel bug.

> 13:36:46.675542 IP 198.51.100.1.42289 > 203.0.113.1.45103: Flags [P.],
> seq 1532895495:1532960978, ack 1, win 504, options [nop,nop,TS val
> 2975547125 ecr 2379476018], length 65483
> 13:36:46.675550 IP 198.51.100.1.42289 > 203.0.113.1.45103: Flags [P.],
> seq 1533021985:1533087468, ack 1, win 504, options [nop,nop,TS val
> 2975547125 ecr 2379476018], length 65483