[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.58a32e438c@gmail.com>
Date: Thu, 08 Jan 2026 14:02:15 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Willem de Bruijn <willemb@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [TEST] txtimestamp.sh pains after netdev foundation migration
Jakub Kicinski wrote:
> On Wed, 7 Jan 2026 19:25:11 -0800 Jakub Kicinski wrote:
> > On Wed, 07 Jan 2026 19:19:53 -0500 Willem de Bruijn wrote:
> > > 17 out of 20 happen in the first SND-USR calculation.
> > > One representative example:
> > >
> > > # 7.11 [+0.00] test SND
> > > # 7.11 [+0.00] USR: 1767443466 s 155019 us (seq=0, len=0)
> > > # 7.19 [+0.08] ERROR: 18600 us expected between 10000 and 18000
> > > # 7.19 [+0.00] SND: 1767443466 s 173619 us (seq=0, len=10) (USR +18599 us)
> > > # 7.20 [+0.00] USR: 1767443466 s 243683 us (seq=0, len=0)
> > > # 7.27 [+0.07] SND: 1767443466 s 253690 us (seq=1, len=10) (USR +10006 us)
> > > # 7.27 [+0.00] USR: 1767443466 s 323746 us (seq=0, len=0)
> > > # 7.35 [+0.08] SND: 1767443466 s 333752 us (seq=2, len=10) (USR +10006 us)
> > > # 7.35 [+0.00] USR: 1767443466 s 403811 us (seq=0, len=0)
> > > # 7.43 [+0.08] SND: 1767443466 s 413817 us (seq=3, len=10) (USR +10006 us)
> > > # 7.43 [+0.00] USR-SND: count=4, avg=12154 us, min=10006 us, max=18599 us
> >
> > Hm, that's the first kernel timestamp vs the timestamp in user space?
> > I wonder if we could catch this by re-taking the user stamp after
> > sendmsg() returns, if >1msec elapsed something is probably wrong
> > (we got scheduled out before having a chance to complete the send?)
>
> How about:
>
> diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c
> index 4b4bbc2ce5c9..abcec47ec2e6 100644
> --- a/tools/testing/selftests/net/txtimestamp.c
> +++ b/tools/testing/selftests/net/txtimestamp.c
> @@ -215,6 +215,24 @@ static void print_timestamp_usr(void)
> __print_timestamp(" USR", &ts_usr, 0, 0);
> }
>
> +static void check_timestamp_usr(void)
> +{
> + long long unsigned ts_delta_usec;
> + struct timespec now;
> +
> + if (clock_gettime(CLOCK_REALTIME, &now))
> + error(1, errno, "clock_gettime");
> +
> + ts_delta_usec = timespec_to_ns64(&now) - timespec_to_ns64(&ts_usr);
> + ts_delta_usec /= 1000;
> + if (ts_delta_usec > cfg_delay_tolerance_usec / 2) {
> + cfg_delay_tolerance_usec =
> + ts_delta_usec + cfg_delay_tolerance_usec / 2;
> + fprintf(stderr, "WARN: sendmsg() took %llu us, increasing delay tolerance to %d us\n",
> + ts_delta_usec, cfg_delay_tolerance_usec);
> + }
> +}
> +
> static void print_timestamp(struct scm_timestamping *tss, int tstype,
> int tskey, int payload_len)
> {
> @@ -678,6 +696,8 @@ static void do_test(int family, unsigned int report_opt)
> if (val != total_len)
> error(1, errno, "send");
>
> + check_timestamp_usr();
> +
> /* wait for all errors to be queued, else ACKs arrive OOO */
> if (cfg_sleep_usec)
> usleep(cfg_sleep_usec);
> --
> 2.52.0
Increasing tolerance should work.
The current values are pragmatic choices to be so low as to minimize
total test runtime, but high enough to avoid flakes. Well..
If increasing tolerance, we also need to increase the time the test
waits for all notifications to arrive, cfg_sleep_usec.
Since the majority of errors happen on the first measurement, I was
thinking of a different approach of just taking that as a warm up
measurement.
But I'd also like to poke some more to understand what makes that
run stand out.
Powered by blists - more mailing lists