[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZT6narvE+LxX+7Be@windriver.com>
Date: Sun, 29 Oct 2023 14:41:46 -0400
From: Paul Gortmaker <paul.gortmaker@...driver.com>
To: Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>
Cc: Richard Purdie <richard.purdie@...uxfoundation.org>,
Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: 32 bit qemu regression from v6.5 tip pull [6c480f222128
x86/alternative: Rewrite optimize_nops() some]
The TL;DR is that the Yocto folks encountered a regression in their
automated QA tests (after a move from v6.4 --> v6.5) where non-KVM
enabled boot tests on 32 bit x86 would (with ~2% frequency) splat with:
[0.326235] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.326556] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[0.326965] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[0.331789] __common_interrupt: 0.167 No irq handler for vector
[0.331789] __common_interrupt: 0.112 No irq handler for vector
[0.331789] iret exception: 0000 [#1] PREEMPT SMP
[0.331789] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.7-yocto-standard #1
[0.331789] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[0.331789] EIP: 0x60
[0.331789] Code: Unable to access opcode bytes at 0x36.
..or similar - common theme being FPU init and __common_interrupt.
The 2% reproducibility was a problem, so the Yocto folks asked me to
take a look, and keeping with the TL;DR I managed to bisect it to the
tip merge of alternates, and then in turn to the commit within:
6c480f222128 x86/alternative: Rewrite optimize_nops() some
That failed six times in 381 qemu boots. I've run the commit below it,
14e4ec9c3e91 close to 1500 times (still going) without a fail - since as
we all know at 2%, that bad is bad but good is only statistically proven.
I'm not quite sure where to go next. Has been nearly 20 years since
I've had to juggle NOP counts for some IMHO broken MIPS pipeline. So I
figured I best report it at this point.
I've put a bunch of details in the bugzilla of the Yocto folks here:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=15230
Skip ahead to comment 11 and you'll avoid me chasing FPU changes like
tglx's FPU init relocation commits, only to go nowhere.
I've kept kernel build dirs, boot logs, etc for all the commits I've
touched down into for testing, so I can revisit and re-test easily.
Paul.
Powered by blists - more mailing lists