[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfLECx9HZAKKK+Grv5oTiUTrwSkpscmJy_+EWax9AZU6w@mail.gmail.com>
Date: Mon, 16 Oct 2023 15:28:26 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yan Zhai <yan@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Aya Levin <ayal@...dia.com>,
Tariq Toukan <tariqt@...dia.com>, linux-kernel@...r.kernel.org,
kernel-team@...udflare.com, Florian Westphal <fw@...len.de>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v2 net-next] ipv6: avoid atomic fragment on GSO packets
On Mon, Oct 16, 2023 at 2:51 PM Yan Zhai <yan@...udflare.com> wrote:
>
> On Mon, Oct 16, 2023 at 4:00 PM Alexander H Duyck
> <alexander.duyck@...il.com> wrote:
> >
> > On Mon, 2023-10-16 at 11:23 -0700, Yan Zhai wrote:
> > > GSO packets can contain a trailing segment that is smaller than
> > > gso_size. When examining the dst MTU for such packet, if its gso_size is
> > > too large, then all segments would be fragmented. However, there is a
> > > good chance the trailing segment has smaller actual size than both
> > > gso_size as well as the MTU, which leads to an "atomic fragment". It is
> > > considered harmful in RFC-8021. An Existing report from APNIC also shows
> > > that atomic fragments are more likely to be dropped even it is
> > > equivalent to a no-op [1].
> > >
> > > Refactor __ip6_finish_output code to separate GSO and non-GSO packet
> > > processing. It mirrors __ip_finish_output logic now. Add an extra check
> > > in GSO handling to avoid atomic fragments. Lastly, drop dst_allfrag
> > > check, which is no longer true since commit 9d289715eb5c ("ipv6: stop
> > > sending PTB packets for MTU < 1280").
> > >
> > > Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1]
> > > Fixes: b210de4f8c97 ("net: ipv6: Validate GSO SKB before finish IPv6 processing")
> > > Suggested-by: Florian Westphal <fw@...len.de>
> > > Reported-by: David Wragg <dwragg@...udflare.com>
> > > Signed-off-by: Yan Zhai <yan@...udflare.com>
> > > ---
> > > net/ipv6/ip6_output.c | 33 +++++++++++++++++++++++----------
> > > 1 file changed, 23 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > index a471c7e91761..1de6f3c11655 100644
> > > --- a/net/ipv6/ip6_output.c
> > > +++ b/net/ipv6/ip6_output.c
> > > @@ -162,7 +162,14 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
> > > int err;
> > >
> > > skb_mark_not_on_list(segs);
> > > - err = ip6_fragment(net, sk, segs, ip6_finish_output2);
> > > + /* Last gso segment might be smaller than actual MTU. Adding
> > > + * a fragment header to it would produce an "atomic fragment",
> > > + * which is considered harmful (RFC-8021)
> > > + */
> > > + err = segs->len > mtu ?
> > > + ip6_fragment(net, sk, segs, ip6_finish_output2) :
> > > + ip6_finish_output2(net, sk, segs);
> > > +
> > > if (err && ret == 0)
> > > ret = err;
> > > }
> > > @@ -170,10 +177,19 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
> > > return ret;
> > > }
> > >
> > > +static int ip6_finish_output_gso(struct net *net, struct sock *sk,
> > > + struct sk_buff *skb, unsigned int mtu)
> > > +{
> > > + if (!(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
> > > + !skb_gso_validate_network_len(skb, mtu))
> > > + return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
> >
> > If we are sending fakejumbo or have a frame that doesn't pass the
> > muster it is just going immediately to ip6_finish_output. I think the
> > checks that you removed are needed to keep the socket from getting
> > stuck sending frames that will probably be discarded.
> >
>
> Hi Alexander,
>
> Thanks for the feedback! But I am not sure I follow the situation you
> mentioned here. If it is a fake jumbo but non GSO packet, it won't
> enter ip6_finish_output_gso. What I am really skipping are the
> dst_allfrag and frag_max_size checks on GSO packets, and dst_allfrag
> on non-GSO packets.
>
> As to dst_allfrag, I looked back at the case when this was added:
>
> https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html
>
> The actual feature was set only when a PMTU message carries a value
> smaller than 1280 byte. But the main line kernel just drops such
> messages now since the commit I pointed to in the change log (which
> makes sense because the feature was set based on old RFC-2460
> guidelines, and those have been deprecated in RFC-8200). Iproute2 also
> doesn't expose this option as well. Is there any case that I am not
> aware of here that still relies on it?
>
> For frag_max_size, I might be wrong but to my best knowledge it only
> applies when netfilter defrags packets. However, when dealing with
> fragments, both local output and GRO code won't produce GSO packets in
> the first place. Similarly, if we look at IPv4 implementation, it also
> does not consider frag_max_size in GSO handling. So I intentionally
> skip this for GSO packets in the change. WDYT?
I am not certain. Just looking at the code it seems like there were a
number of corner cases handled that this is getting rid of the code
for. Specifically my main concern is GSO being enabled for a path
where the MTU is incorrect due to something such as a tunnel being
between the system and the endpoint. In such a case it would normally
send back an ICMP message triggering a path MTU update which would
then have to ripple through.
I'm not an IPv6 expert though so perhaps I will leave that for
somebody else to provide feedback on.
Powered by blists - more mailing lists