[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.39fa9d8834471@gmail.com>
Date: Tue, 25 Nov 2025 14:49:00 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Neal Cardwell <ncardwell@...gle.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
netdev@...r.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
Neal Cardwell wrote:
> On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Jakub Kicinski wrote:
> > > Hi Willem!
> > >
> > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > >
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet: 1.000237 P. 36:37(1) ack 1
> > > # actual packet: 1.000235 P. 36:37(1) ack 1 win 1050
> > > # not ok 1 ipv4
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet: 1.000209 P. 36:37(1) ack 1
> > > # actual packet: 1.000208 P. 36:37(1) ack 1 win 1050
> > > # not ok 2 ipv6
> > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > >
> > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > >
> > > This happens on both debug and non-debug kernel (tho on the former
> > > the failure is masked due to MACHINE_SLOW).
> >
> > That's an odd error.
> >
> > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > only 17 fit in one packet, followed by a single 1 byte packet. The
> > test does not explicitly initialize payload, but trusts packetdrill
> > to handle that. Relevant snippet below.
> >
> > Packetdrill complains about payload contents. That error is only
> > generated by the below check in run_packet.c. Pretty straightforward.
> >
> > Packetdrill agrees that the packet is one byte long. The win argument
> > is optional on outgoing packets, not relevant to the failure.
> >
> > So somehow the data in that frag got overwritten in the short window
> > between when it was injected into the kernel and when it was observed?
> > Seems so unlikely.
> >
> > Sorry, I'm a bit at a loss at least initially as to the cause.
>
> I agree this is odd. It looks like either a very concerning kernel
> bug, or very concerning packetdrill bug. :-)
>
> Could someone please run the test with tcpump in the background to
> capture the full packet contents, to verify that indeed the packet has
> the wrong contents?
>
> This would help make sure that this is a kernel bug and not a
> packetdrill bug. :-)
I'm not able to reproduce this on my own machine with the latest nn.
But could reproduce it on the netdev machine.
I assume all payload is supposed to be zeroed. And indeed the packet
seen has a non-zero single byte of payload: 0x60.
Is there any chance that this happens on some kernel with
unsubmitted patches, but not on netdev-nn/main on this machine either?
----
tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
outbound data payload
script packet: 1.000169 P. 36:37(1) ack 1
actual packet: 1.000167 P. 36:37(1) ack 1 win 1050
14:42:01.330694 tun0 Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
length 17: HTTP
0x0000: 6000 842c 0025 0640 fd3d 0a0b 17d6 0000
0x0010: 0000 0000 0000 0001 fd3d fa7b d17d 0000
0x0020: 0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
0x0030: 0000 0001 5018 041a e883 0000 0000 0000
0x0040: 0000 0000 0000 0000 0000 0000 00
14:42:01.330723 tun0 In IP6 fd3d:fa7b:d17d::1.50901 >
fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
0x0000: 6000 0000 0014 06ff fd3d fa7b d17d 0000
0x0010: 0000 0000 0000 0001 fd3d 0a0b 17d6 0000
0x0020: 0000 0000 0000 0001 c6d5 1f90 0000 0001
0x0030: f7fe 05fa 5010 0101 e21b 0000
14:42:01.330727 tun0 Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
length 1: HTTP
0x0000: 6000 842c 0015 0640 fd3d 0a0b 17d6 0000
0x0010: 0000 0000 0000 0001 fd3d fa7b d17d 0000
0x0020: 0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
0x0030: 0000 0001 5018 041a e873 0000 60
Powered by blists - more mailing lists