netdev - Re: [TEST] tcp_zerocopy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <willemdebruijn.kernel.1fe4306a89d08@gmail.com>
Date: Mon, 24 Nov 2025 11:29:31 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails

Jakub Kicinski wrote:
> Hi Willem!
> 
> I migrated netdev CI to our own infra now, and the slightly faster,
> Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> 
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000237 P. 36:37(1) ack 1 
> # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050 
> # not ok 1 ipv4
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000209 P. 36:37(1) ack 1 
> # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050 
> # not ok 2 ipv6
> # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> 
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> 
> This happens on both debug and non-debug kernel (tho on the former 
> the failure is masked due to MACHINE_SLOW).

That's an odd error.

The test send an msg_iov of 18 1 byte fragments. And verifies that
only 17 fit in one packet, followed by a single 1 byte packet. The
test does not explicitly initialize payload, but trusts packetdrill
to handle that. Relevant snippet below.

Packetdrill complains about payload contents. That error is only
generated by the below check in run_packet.c. Pretty straightforward.

Packetdrill agrees that the packet is one byte long. The win argument
is optional on outgoing packets, not relevant to the failure.

So somehow the data in that frag got overwritten in the short window
between when it was injected into the kernel and when it was observed?
Seems so unlikely.

Sorry, I'm a bit at a loss at least initially as to the cause.

----

   // send a zerocopy iov of 18 elements:
   +1 sendmsg(4, {msg_name(...)=...,
                  msg_iov(18)=[{..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}],
                  msg_flags=0}, MSG_ZEROCOPY) = 18

   // verify that it is split in one skb of 17 frags + 1 of 1 frag
   // verify that both have the PSH bit set
   +0 > P. 19:36(17) ack 1
   +0 < . 1:1(0) ack 36 win 257

   +0 > P. 36:37(1) ack 1
   +0 < . 1:1(0) ack 37 win 257

----

/* Verify TCP/UDP payload matches expected value. */
static int verify_outbound_live_payload(
        struct packet *actual_packet,
        struct packet *script_packet, char **error)
{
        /* Diff the TCP/UDP data payloads. We've already implicitly
         * checked their length by checking the IP and TCP/UDP headers.
         */
        assert(packet_payload_len(actual_packet) ==
               packet_payload_len(script_packet));
        if (memcmp(packet_payload(script_packet),
                   packet_payload(actual_packet),
                   packet_payload_len(script_packet)) != 0) {
                asprintf(error, "incorrect outbound data payload");
                return STATUS_ERR;
        }
        return STATUS_OK;
}