[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <q4ezq2eipk27fo5e33fqsmqqpluj35qquihw6tgcfpndzgggah@apfg6ncuvwix>
Date: Mon, 5 Jan 2026 21:16:19 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Andrew Jones <andrew.jones@...ux.dev>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [kvm-unit-tests PATCH] x86: Increase the timeout for
vmx_pf_{vpid/no_vpid/invvpid}_test
On Mon, Jan 05, 2026 at 07:42:36PM +0000, Yosry Ahmed wrote:
> On Mon, Jan 05, 2026 at 11:19:21AM -0800, Sean Christopherson wrote:
> > On Mon, Jan 05, 2026, Yosry Ahmed wrote:
> > > On Mon, Jan 05, 2026 at 09:54:13AM -0800, Sean Christopherson wrote:
> > > > On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > > > > When running the tests on some older CPUs (e.g. Skylake) on a kernel
> > > > > with some debug config options enabled (e.g. CONFIG_DEBUG_VM,
> > > > > CONFIG_PROVE_LOCKING, ..), the tests timeout. In this specific setup,
> > > > > the tests take between 4 and 5 minutes, so pump the timeout from 4 to 6
> > > > > minutes.
> > > >
> > > > Ugh. Can anyone think of a not-insane way to skip these tests when running in
> > > > an environment that is going to be sloooooow? Because (a) a 6 minute timeout
> > > > could very well hide _real_ KVM bugs, e.g. if is being too aggressive with TLB
> > > > flushes (speaking from experience) and (b) running a 5+ minute test is a likely
> > > > a waste of time/resources.
> > >
> > > The definition of a slow enviroment is also very dynamic, I don't think
> > > we want to play whack-a-mole with config options or runtime knobs that
> > > would make the tests slow.
> > >
> > > I don't like just increasing the timeout either, but the tests are slow
> > > even without these specific config options. They only make them a little
> > > bit slower, enough to consistently reproduce the timeout.
> >
> > Heh, "little bit" is also subjective. The tests _can_ run in less than 10
> > seconds:
> >
> > $ time qemu --no-reboot -nodefaults -global kvm-pit.lost_tick_policy=discard
> > -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none
> > -serial stdio -device pci-testdev -machine accel=kvm,kernel_irqchip=split
> > -kernel x86/vmx.flat -smp 1 -append vmx_pf_invvpid_test -cpu max,+vmx
> >
> > 933897 tests, 0 failures
> > PASS: 4-level paging tests
> > filter = vmx_pf_invvpid_test, test = vmx_pf_vpid_test
> > filter = vmx_pf_invvpid_test, test = vmx_exception_test
> > filter = vmx_pf_invvpid_test, test = vmx_canonical_test
> > filter = vmx_pf_invvpid_test, test = vmx_cet_test
> > SUMMARY: 1867887 tests
> > Command exited with non-zero status 1
> > 3.69user 3.19system 0:06.90elapsed 99%CPU
> >
> > > This is also acknowledged by commit ca785dae0dd3 ("vmx: separate VPID
> > > tests"), which introduced the separate targets to increase the timeout.
> > > It mentions the 3 tests taking 12m (so roughly 4m each).
> >
> > Because of debug kernels. With a fully capable host+KVM and non-debug kernel,
> > the tests take ~50 seconds each.
> >
> > Looking at why the tests can run in ~7 seconds, the key difference is that the
> > above run was done with ept=0, which culls the Protection Keys tests (KVM doesn't
> > support PKU when using shadow paging because it'd be insane to emulate correctly).
> > The PKU testcases increase the total number of testcases by 10x, which leads to
> > timeouts with debug kernels.
> >
> > Rather than run with a rather absurd timeout, what if we disable PKU in the guest
> > for the tests? Running all four tests completes in <20 seconds:
>
> This looks good. On the Icelake machine they took around 1m 24s, and I
> suspect they will take a bit longer with all the debug options, so we'll
> still need a longer timeout than the default 90s (maybe 120s or 180s).
I tried with the debug kernel (including CONFIG_DEBUG_VM and others) on
both Skylake and Icelake. It timed out on both with the default 90s
timeout.
With 180s timeout, it took 1m40s and 1m37s on Icelake and Skylake
respecitvely. So I think if we keep them combined we should at least use
120s for the timeout.
or..
>
> Alternatively, we can keep the targets separate if we want to keep the
> default timeout.
Powered by blists - more mailing lists