[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAO3-PbpmNx+Ja2DkW+R6jwCVTxS6zKP4VxJ8+HXCuo4s-7xPiQ@mail.gmail.com>
Date: Wed, 12 Jul 2023 21:01:46 -0500
From: Yan Zhai <yan@...udflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Paolo Abeni <pabeni@...hat.com>, Marek Majkowski <marek@...udflare.com>,
network dev <netdev@...r.kernel.org>, kernel-team <kernel-team@...udflare.com>,
Andrew Melnychenko <andrew@...nix.com>, Jason Wang <jasowang@...hat.com>
Subject: Re: Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4
forwarding path
On Wed, Jul 12, 2023 at 7:51 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Wed, Jul 12, 2023 at 11:20 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > Adding Jason,
> > On Wed, 2023-07-12 at 09:58 -0500, Yan Zhai wrote:
> > > On Wed, Jul 12, 2023 at 9:00 AM Marek Majkowski <marek@...udflare.com> wrote:
> > > >
> > > > Dear netdev,
> > > >
> > > > I encountered a puzzling problem, please help.
> > > >
> > > > Rootless repro:
> > > > https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18
> > > >
> > > > To run:
> > > >
> > > > ```
> > > > $ unshare -Urn python3 udp-gro-forwarding-bug.py
> > > > tap0 In IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > > > lo P IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > > lo P IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > > lo P IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > > > '''
> > > >
> > > > The code is really quite simple. First I create and open a tap device.
> > > > Then I send a large (>MTU) packet with vnethdr over tap0. The
> > > > gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
> > > > from tap0, be forwarded somewhere, where it will eventually be
> > > > segmented by software or hardware.
> > > >
> > > > The egress tap0 packet looks perfectly fine:
> > > >
> > > > tap0 In IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > > >
> > > > To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
> > > > where I move egress tap0 packets to ingress lo, like this:
> > > >
> > > > > tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo
>
> This is the opposite of stated, attach to tap0 at ingress and send to
> lo on egress?
>
> It might matter only in the sense that tc_mirred on egress acts in
> dev_queue_xmit before any segmentation would occur.
>
> Probably irrelevant, as your example clearly hits the segmentation
> logic, and it sounds like the root cause there is already well
> understood.
>
> > > >
> > > > On ingress lo I see something really weird:
> > > >
> > > > lo P IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > > lo P IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > > lo P IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > > >
> > > > This looks like IPv4 fragments without the IP fragmentation bits set.
> > > >
> > > > I think there are two independent problems here:
> > > >
> > > > (1) The packet is *fragmented* and that is plain wrong here. I'm
> > > > asking for USO not UFO in vnethdr.
> > > >
> > >
> > > To add some context our virtio header in hex format (12 bytes) is
> > > 01052a007805220006000000.
> > >
> > > Some digging shows that the issue seems to come from this patch:
> > > https://lore.kernel.org/netdev/20220907125048.396126-2-andrew@daynix.com/
> > > At this point, skb_shared_info->gso_type is SKB_GSO_UDP_L4 |
> > > SKB_GSO_DODGY, here the DODGY bit is set inside tun_get_user. So the
> > > skb_gso_ok check will return true here, then the skb will fall to the
> > > fragment code. Simple tracing confirms that __udp_gso_segment is never
> > > called in this scenario.
> > >
> > > So the question is really how to handle the DODGY bit. IMHO it is not
> > > right to fall to the fragment path when the actual packet request is
> > > segmentation.
> >
> > I do agree with the above.
> >
> > > Will it be sufficient to just recompute the gso_segs
> > > here and return the head skb instead?
> >
> > Something alike what TCP is currently doing:
> >
> > https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_offload.c#L85
> >
> > should be fine - constrained to the skb_shinfo(skb)->gso_type &
> > SKB_GSO_UDP_L4 case.
>
> Agreed. That's the canonical way to check dodgy segmentation offload
> packets. All USO packets should enter __udp_gso_segment.
>
> After validation and DODGY removal, a packet may fall through to
> device segmentation if the devices advertises the feature (by returning
> segs == NULL).
I sent a patch at
https://lore.kernel.org/netdev/ZK9ZiNMsJX8+1F3N@debian.debian/T/#u.
Tested on my local machine it is working correctly now.
--
Yan
Powered by blists - more mailing lists