[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240426122402.GA36092@kernel.org>
Date: Fri, 26 Apr 2024 08:24:02 -0400
From: Paul Gortmaker <paulg@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
linux-kernel@...r.kernel.org,
Richard Purdie <richard.purdie@...uxfoundation.org>
Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80
changes
[Apologies for repeated info; last mail didn't make it to the list]
[Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes] On 24/04/2024 (Wed 21:51) Borislav Petkov wrote:
> On Wed, Apr 24, 2024 at 02:58:06PM -0400, Paul Gortmaker wrote:
> ...
> > pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
> > pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
> > pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
> > pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
> > pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
> > <hang - not always exactly here, but always in this block of PCI printk>
>
> How would those commits have anything to do with such an early hang?!
>
> Nothing that early is issuing INT80 32-bit syscalls, is it?
>
> Btw, can you checkout the Linus tree at...
>
> f35e46631b28 Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> f4116bfc4462 x86/tdx: Allow 32-bit emulation by default
>
>
> <-- here and test that commit as the top one?
>
> 55617fb991df x86/entry: Do not allow external 0x80 interrupts
They both show the issue, but that really doesn't matter now. When you
guys pointed out it really didn't make sense, I did what I should have
done before - tested the crap out of ^1, the trunk just before the
INT80 merge:
commit f35e46631b28a63ca3887d7afef1a65a5544da52
Merge: 55b224d90d44 f4116bfc4462
^^^^^^^^^^^^
Author: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Thu Dec 7 11:56:34 2023 -0800
Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
..which would be 55b224d90d44 (parisc merge). So I left that run
for near 24h (almost 2000 runs), and got 8 PCI-hang instances. :(
Which means INT80 isn't even there yet.
So I owe you guys an apology for pointing the finger at INT80. I still
don't understand how the pseudo bisect on v6.6-stable seems so
"concrete". The v6.6.6 worked "fine" (it seemed) and v6.6.7 died fairly
quickly. The revert of INT80 on v6.6.7 seemed to "fix" it - but if so,
it was only because it perturbed something else.
I already knew my "good" bisect points were not "proven" good, but only
statistically "good". Seems I need to revisit some of those "good" data
points (both on v6.6-stable) and on mainline and test longer.
>
> which reminds me - that hang could be actually that guest kernel
> panicking but the panic not coming out to the console.
>
> When it hangs, can you connect with gdb to qemu and dump stack and
> registers?
>
> Make sure you have DEBUG_INFO enabled in the guest kernel.
I want to try some of these things, but I also don't want to
accidentally lose the reproducer I have. Maybe I'll see if I can
reproduce it at home, since I'll lose use of the current box in a week
anyway...
Again, sorry for the false positive. I let the v6.6-stable testing bias
my mainline conclusions to where I didn't test underneath INT80. I'll
follow up with more details once (if?) I manage to properly sort this.
Paul.
--
>
> Is this even a guest?
>
> I know you had guests last time you reported the alternatives issue.
>
> Right, and then test the tree checked out at this commit:
>
> be5341eb0d43 x86/entry: Convert INT 0x80 emulation to IDTENTRY
>
> The others should be unrelated...
>
> b82a8dbd3d2f x86/coco: Disable 32-bit emulation by default on TDX and SEV
>
> Hmm.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists