lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <q4ezq2eipk27fo5e33fqsmqqpluj35qquihw6tgcfpndzgggah@apfg6ncuvwix>
Date: Mon, 5 Jan 2026 21:16:19 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, 
	Andrew Jones <andrew.jones@...ux.dev>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [kvm-unit-tests PATCH] x86: Increase the timeout for
 vmx_pf_{vpid/no_vpid/invvpid}_test

On Mon, Jan 05, 2026 at 07:42:36PM +0000, Yosry Ahmed wrote:
> On Mon, Jan 05, 2026 at 11:19:21AM -0800, Sean Christopherson wrote:
> > On Mon, Jan 05, 2026, Yosry Ahmed wrote:
> > > On Mon, Jan 05, 2026 at 09:54:13AM -0800, Sean Christopherson wrote:
> > > > On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > > > > When running the tests on some older CPUs (e.g. Skylake) on a kernel
> > > > > with some debug config options enabled (e.g. CONFIG_DEBUG_VM,
> > > > > CONFIG_PROVE_LOCKING, ..), the tests timeout. In this specific setup,
> > > > > the tests take between 4 and 5 minutes, so pump the timeout from 4 to 6
> > > > > minutes.
> > > > 
> > > > Ugh.  Can anyone think of a not-insane way to skip these tests when running in
> > > > an environment that is going to be sloooooow?  Because (a) a 6 minute timeout
> > > > could very well hide _real_ KVM bugs, e.g. if is being too aggressive with TLB
> > > > flushes (speaking from experience) and (b) running a 5+ minute test is a likely
> > > > a waste of time/resources.
> > > 
> > > The definition of a slow enviroment is also very dynamic, I don't think
> > > we want to play whack-a-mole with config options or runtime knobs that
> > > would make the tests slow.
> > > 
> > > I don't like just increasing the timeout either, but the tests are slow
> > > even without these specific config options. They only make them a little
> > > bit slower, enough to consistently reproduce the timeout.
> > 
> > Heh, "little bit" is also subjective.  The tests _can_ run in less than 10
> > seconds:
> > 
> > $ time qemu --no-reboot -nodefaults -global kvm-pit.lost_tick_policy=discard
> >   -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none
> >   -serial stdio -device pci-testdev -machine accel=kvm,kernel_irqchip=split
> >   -kernel x86/vmx.flat -smp 1 -append vmx_pf_invvpid_test -cpu max,+vmx
> > 
> > 933897 tests, 0 failures
> > PASS: 4-level paging tests
> > filter = vmx_pf_invvpid_test, test = vmx_pf_vpid_test
> > filter = vmx_pf_invvpid_test, test = vmx_exception_test
> > filter = vmx_pf_invvpid_test, test = vmx_canonical_test
> > filter = vmx_pf_invvpid_test, test = vmx_cet_test
> > SUMMARY: 1867887 tests
> > Command exited with non-zero status 1
> > 3.69user 3.19system 0:06.90elapsed 99%CPU
> > 
> > > This is also acknowledged by commit ca785dae0dd3 ("vmx: separate VPID
> > > tests"), which introduced the separate targets to increase the timeout.
> > > It mentions the 3 tests taking 12m (so roughly 4m each). 
> > 
> > Because of debug kernels.  With a fully capable host+KVM and non-debug kernel,
> > the tests take ~50 seconds each.
> > 
> > Looking at why the tests can run in ~7 seconds, the key difference is that the
> > above run was done with ept=0, which culls the Protection Keys tests (KVM doesn't
> > support PKU when using shadow paging because it'd be insane to emulate correctly).
> > The PKU testcases increase the total number of testcases by 10x, which leads to
> > timeouts with debug kernels.
> > 
> > Rather than run with a rather absurd timeout, what if we disable PKU in the guest
> > for the tests?  Running all four tests completes in <20 seconds:
> 
> This looks good. On the Icelake machine they took around 1m 24s, and I
> suspect they will take a bit longer with all the debug options, so we'll
> still need a longer timeout than the default 90s (maybe 120s or 180s).

I tried with the debug kernel (including CONFIG_DEBUG_VM and others) on
both Skylake and Icelake. It timed out on both with the default 90s
timeout.

With 180s timeout, it took 1m40s and 1m37s on Icelake and Skylake
respecitvely. So I think if we keep them combined we should at least use
120s for the timeout.

or..

> 
> Alternatively, we can keep the targets separate if we want to keep the
> default timeout.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ