[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65bd894de7b93_38e921294a9@willemb.c.googlers.com.notmuch>
Date: Fri, 02 Feb 2024 19:31:09 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org,
davem@...emloft.net,
edumazet@...gle.com,
pabeni@...hat.com,
linux-kselftest@...r.kernel.org,
Willem de Bruijn <willemb@...gle.com>,
Matthieu Baerts <matttbe@...nel.org>
Subject: Re: [PATCH net-next] selftests/net: ignore timing errors in so_txtime
if KSFT_MACHINE_SLOW
Jakub Kicinski wrote:
> On Thu, 1 Feb 2024 11:21:19 -0500 Willem de Bruijn wrote:
> > This test is time sensitive. It may fail on virtual machines and for
> > debug builds.
> >
> > Continue to run in these environments to get code coverage. But
> > optionally suppress failure for timing errors (only). This is
> > controlled with environment variable KSFT_MACHINE_SLOW.
> >
> > The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL
> > as previously discussed. Because making so_txtime.c return that and
> > then making so_txtime.sh capture runs that pass that vs KSFT_FAIL
> > and pass it on added a bunch of (fragile bash) boilerplate, while the
> > result is interpreted the same as KSFT_PASS anyway.
>
> FWIW another idea that came up when talking to Matthieu -
> isolate the VMs which run time-sensitive tests to dedicated
> CPUs. Right now we kick off around 70 4 CPU VMs and let them
> battle for 72 cores. The machines don't look overloaded but
> there can be some latency spikes (CPU use diagram attached).
>
> So the idea would be to have a handful of special VMs running
> on dedicated CPUs without any CPU time competition. That could help
> with latency spikes. But we'd probably need to annotate the tests
> which need some special treatment.
>
> Probably too much work both to annotate tests and set up env,
> but I thought I'd bring it up here in case you had an opinion.
I'm not sure whether the issue with timing in VMs is CPU affinity.
Variance may just come from expensive hypercalls, even with a
dedicated CPU. Though tests can tell.
There's still the debug builds, as well.
Powered by blists - more mailing lists