[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQym7Whnbc9xf_dew-ey1fGFBY1dSf6RJ=9qLNP=u+NYOEw@mail.gmail.com>
Date: Mon, 24 Nov 2025 11:38:03 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Jakub Kicinski wrote:
> > Hi Willem!
> >
> > I migrated netdev CI to our own infra now, and the slightly faster,
> > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> >
> > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > # script packet: 1.000237 P. 36:37(1) ack 1
> > # actual packet: 1.000235 P. 36:37(1) ack 1 win 1050
> > # not ok 1 ipv4
> > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > # script packet: 1.000209 P. 36:37(1) ack 1
> > # actual packet: 1.000208 P. 36:37(1) ack 1 win 1050
> > # not ok 2 ipv6
> > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> >
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> >
> > This happens on both debug and non-debug kernel (tho on the former
> > the failure is masked due to MACHINE_SLOW).
>
> That's an odd error.
>
> The test send an msg_iov of 18 1 byte fragments. And verifies that
> only 17 fit in one packet, followed by a single 1 byte packet. The
> test does not explicitly initialize payload, but trusts packetdrill
> to handle that. Relevant snippet below.
>
> Packetdrill complains about payload contents. That error is only
> generated by the below check in run_packet.c. Pretty straightforward.
>
> Packetdrill agrees that the packet is one byte long. The win argument
> is optional on outgoing packets, not relevant to the failure.
>
> So somehow the data in that frag got overwritten in the short window
> between when it was injected into the kernel and when it was observed?
> Seems so unlikely.
>
> Sorry, I'm a bit at a loss at least initially as to the cause.
I agree this is odd. It looks like either a very concerning kernel
bug, or very concerning packetdrill bug. :-)
Could someone please run the test with tcpump in the background to
capture the full packet contents, to verify that indeed the packet has
the wrong contents?
This would help make sure that this is a kernel bug and not a
packetdrill bug. :-)
thanks,
neal
Powered by blists - more mailing lists