[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <661809892b2a_2f1f4c294a3@willemb.c.googlers.com.notmuch>
Date: Thu, 11 Apr 2024 12:02:17 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Richard Gobert <richardbgobert@...il.com>,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
willemdebruijn.kernel@...il.com,
shuah@...nel.org,
dsahern@...nel.org,
aduyck@...antis.com,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org
Cc: Richard Gobert <richardbgobert@...il.com>
Subject: Re: [PATCH net-next v6 2/6] net: gro: add p_off param in
*_gro_complete
Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> > > Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
> > > additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
> > > complete phase of gro. The functions always return skb->network_header,
> > > which in the case of encapsulated packets at the gro complete phase, is
> > > always set to the innermost L3 of the packet. That means that calling
> > > {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
> > > gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
> > > L3/L4 may return an unexpected value.
> > >
> > > This incorrect usage leads to a bug in GRO's UDP socket lookup.
> > > udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
> > > *_hdr functions return network_header which will point to the innermost L3,
> > > resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
> > > encapsulated packets.
> > >
> > > To fix this issue p_off param is used in *_gro_complete to pass off the
> > > offset of the previous layer.
> >
> > What exactly does this mean?
> >
> > This patch changes the definition of gro_complete to add a thoff
> > alongside the existing "nhoff"..
> >
> > > - int (*gro_complete)(struct sk_buff *skb, int nhoff);
> > > + int (*gro_complete)(struct sk_buff *skb, int nhoff,
> > > + int thoff);
> >
> > .. but also fixes up implementations to interpret the existing
> > argument as a thoff
> >
> > > -INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
> > > +INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int nhoff,
> > > + int thoff)
> > > {
> > > - const struct iphdr *iph = ip_hdr(skb);
> > > - struct tcphdr *th = tcp_hdr(skb);
> > > + const struct iphdr *iph = (const struct iphdr *)(skb->data + nhoff);
> > > + struct tcphdr *th = (struct tcphdr *)(skb->data + thoff);
> >
> > But in some cases the new argument is not nhoff but p_off, e.g.,
> >
> > > static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
> > > - int nhoff)
> > > + int p_off, int nhoff)
> >
> > Really, the argument is the start of the next header, each callback
> > just casts to its expected header (ethhdr, tcphdr, etc.)
> >
> > The only place where we need to pass an extra argument is in udp,
> > because that needs a pointer to the network header right before the
> > transport header pointed to by nhoff.
> >
> > And only due to possible IPv4 options or IPv6 extension headers, we
> > cannot just do
> >
> > + struct udphdr *iph = (struct iphdr *)(skb->data + nhoff - sizeof(*iph));
> > struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> >
> > I also do not immediately see an a way to avoid all the boilerplate
> > of a new argument in every callback. Aside from a per_cpu var -- but
> > that is excessive.
> >
> > But it can just be left zero in all callsites, except for
> > inet_gro_complete/ipv6_gro_complete, which pass in nhoff.
>
> Actually, we can avoid the boilerplate changes that add an extra arg.
>
> By changing the contract between network layer callbacks
> (inet_gro_complete/ipv6_gro_complete) and transport layer callbacks
> (tcp4_gro_complete et al).
Actually, only when calling udp4_gro_complete or udp6_gro_complete.
> It's also a bit of a hack. But a lot smaller patch, probably.
Feel free to disagree with this approach. Just a suggestion.
Powered by blists - more mailing lists