netdev - Re: [TEST] txtimestamp.sh pains after netdev foundation migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <willemdebruijn.kernel.276cd2b2b0063@gmail.com>
Date: Wed, 07 Jan 2026 19:19:53 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>, 
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [TEST] txtimestamp.sh pains after netdev foundation migration

Jakub Kicinski wrote:
> Hi Willem!
> 
> We discussed instability of txtimestamp.sh in the past but it has
> gotten even worse after we migrated from AWS to netdev foundation
> machines. Possibly because it's different HW. Possibly because we
> now run much newer kernels (AWS Linux vs Fedora).
> 
> The test flakes a lot (we're talking about non-debug builds):
> https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
> 
> I tried a few things. The VM threads (vCPU, not IO) are now all pinned
> to dedicated CPUs. I added this patch to avoid long idle periods:
> https://github.com/linux-netdev/testing/commit/d468f582c617adece2a576788746a09d91e91574
> 
> These both help a little bit, but w still get 10+ flakes a week.
> I believe you have access to netdev foundation machines so feel
> free to poke if you have cycles..

>From a first look at the most recent 20 flakes 
(ignoring two unrelated sockaddr failures).

17 out of 20 happen in the first SND-USR calculation.
One representative example:

    # 7.11 [+0.00] test SND
    # 7.11 [+0.00]     USR: 1767443466 s 155019 us (seq=0, len=0)
    # 7.19 [+0.08] ERROR: 18600 us expected between 10000 and 18000
    # 7.19 [+0.00]     SND: 1767443466 s 173619 us (seq=0, len=10)  (USR +18599 us)
    # 7.20 [+0.00]     USR: 1767443466 s 243683 us (seq=0, len=0)
    # 7.27 [+0.07]     SND: 1767443466 s 253690 us (seq=1, len=10)  (USR +10006 us)
    # 7.27 [+0.00]     USR: 1767443466 s 323746 us (seq=0, len=0)
    # 7.35 [+0.08]     SND: 1767443466 s 333752 us (seq=2, len=10)  (USR +10006 us)
    # 7.35 [+0.00]     USR: 1767443466 s 403811 us (seq=0, len=0)
    # 7.43 [+0.08]     SND: 1767443466 s 413817 us (seq=3, len=10)  (USR +10006 us)
    # 7.43 [+0.00]     USR-SND: count=4, avg=12154 us, min=10006 us, max=18599 us

These are just outside the bounds of 18000. So increasing the
tolerance in txtimestamp.sh will probably mitigate them. All 17
would have passed with the following change.

-        local -r args="$@ -v 10000 -V 60000 -t 8000 -S 80000"
+        local -r args="$@ -v 10000 -V 60000 -t 8000 -S 100000"

Admittedly a hacky workaround that will only reduce the rate.

It's interesting that

- every time it is the first of the four measurements that fails.
- it never seems to occur for TCP sockets.