[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93bcabebe678b532cd8ee75fa2f48f32ceeb64b2.camel@intel.com>
Date: Tue, 25 Jun 2024 05:26:37 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "mingo@...nel.org" <mingo@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "bp@...en8.de"
<bp@...en8.de>, "peterz@...radead.org" <peterz@...radead.org>,
"hpa@...or.com" <hpa@...or.com>, "luto@...capital.net" <luto@...capital.net>,
"tglx@...utronix.de" <tglx@...utronix.de>, "torvalds@...ux-foundation.org"
<torvalds@...ux-foundation.org>, "dave@...1.net" <dave@...1.net>,
"oleg@...hat.com" <oleg@...hat.com>, "ubizjak@...il.com" <ubizjak@...il.com>
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
On Wed, 2024-06-05 at 10:35 +0200, Ingo Molnar wrote:
> As suggested by Oleg, remove the thread::fpu pointer, as we can
> calculate it via x86_task_fpu() at compile-time.
I'm seeing boot failures in a TDX VM that bisects to this commit in tip
(807333522953). The host is a pile of out-of-tree KVM patches, but the nature of
the change makes me wonder if it's not TDX related. The failure looks like the
below on the first bad commit. Some of the later commits had a failure with more
of an FPU associated stack trace. It also only shows when I have lock debugging
on:
#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)
If it's not obvious, I can investigate some more tomorrow on a more normal VM
configuration.
[ 8.830714] ------------[ cut here ]------------
[ 8.830714] DEBUG_LOCKS_WARN_ON(1)
[ 8.830714] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:232
__lock_acquire+0xa5c/0x2120
[ 8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S
6.10.0-rc3-00004-g807333522953 #117
[ 8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[ 8.830714] RIP: 0010:__lock_acquire+0xa5c/0x2120
[ 8.830714] Code: b8 85 c0 0f 84 39 fd ff ff 8b 05 13 4d 92 01 85 c0 0f 85 2b
fd ff ff 48 c7 c6 c5 93 3e 82 48 c7 c7 66 aa 39 82 e8 24 34 f8 ff <0f> 0b 31 c0
44 8b 5d b8 e9 60 f7 ff ff 88 55 b0 44 89 5d b8 e8 4b
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010086
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: ff11000464000d78 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: ff1100047ffff000 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Call Trace:
[ 8.830714] <TASK>
[ 8.830714] ? show_regs+0x60/0x70
[ 8.830714] ? __warn+0x84/0x180
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? report_bug+0x1f3/0x200
[ 8.830714] ? handle_bug+0x40/0x70
[ 8.830714] ? exc_invalid_op+0x19/0x70
[ 8.830714] ? asm_exc_invalid_op+0x1b/0x20
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? common_startup_64+0x13e/0x148
[ 8.830714] lock_acquire+0xc8/0x2e0
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] ? __this_cpu_preempt_check+0x13/0x20
[ 8.830714] ? _raw_spin_lock_irq+0x4b/0x50
[ 8.830714] _raw_spin_lock_irq+0x37/0x50
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] copy_process+0x107/0x2b80
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] ? lock_release+0x130/0x290
[ 8.830714] ? _raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] kernel_clone+0x97/0x3b0
[ 8.830714] ? __mutex_unlock_slowpath+0x3c/0x2a0
[ 8.830714] user_mode_thread+0x59/0x70
[ 8.830714] ? rest_init+0x190/0x190
[ 8.830714] rest_init+0x1e/0x190
[ 8.830714] start_kernel+0x672/0x790
[ 8.830714] x86_64_start_reservations+0x18/0x30
[ 8.830714] x86_64_start_kernel+0xd0/0xe0
[ 8.830714] common_startup_64+0x13e/0x148
[ 8.830714] </TASK>
[ 8.830714] irq event stamp: 27428
[ 8.830714] hardirqs last enabled at (27427): [<ffffffff81dc236c>]
_raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] hardirqs last disabled at (27428): [<ffffffff81dc208b>]
_raw_spin_lock_irq+0x4b/0x50
[ 8.830714] softirqs last enabled at (27360): [<ffffffff8116d93c>]
cgroup_idr_alloc.constprop.0+0x5c/0x100
[ 8.830714] softirqs last disabled at (27358): [<ffffffff8116d916>]
cgroup_idr_alloc.constprop.0+0x36/0x100
[ 8.830714] ---[ end trace 0000000000000000 ]---
[ 8.830714] BUG: kernel NULL pointer dereference, address: 00000000000000c4
[ 8.830714] #PF: supervisor read access in kernel mode
[ 8.830714] #PF: error_code(0x0000) - not-present page
[ 8.830714] PGD 0
[ 8.830714] Oops: Oops: 0000 [#1] PREEMPT SMP
[ 8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S W
6.10.0-rc3-00004-g807333522953 #117
[ 8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[ 8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[ 8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Call Trace:
[ 8.830714] <TASK>
[ 8.830714] ? show_regs+0x60/0x70
[ 8.830714] ? __die+0x20/0x60
[ 8.830714] ? page_fault_oops+0x15a/0x480
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? exc_page_fault+0x437/0x910
[ 8.830714] ? asm_exc_page_fault+0x27/0x30
[ 8.830714] ? __lock_acquire+0x1c9/0x2120
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? common_startup_64+0x13e/0x148
[ 8.830714] lock_acquire+0xc8/0x2e0
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] ? __this_cpu_preempt_check+0x13/0x20
[ 8.830714] ? _raw_spin_lock_irq+0x4b/0x50
[ 8.830714] _raw_spin_lock_irq+0x37/0x50
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] copy_process+0x107/0x2b80
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] ? lock_release+0x130/0x290
[ 8.830714] ? _raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] kernel_clone+0x97/0x3b0
[ 8.830714] ? __mutex_unlock_slowpath+0x3c/0x2a0
[ 8.830714] user_mode_thread+0x59/0x70
[ 8.830714] ? rest_init+0x190/0x190
[ 8.830714] rest_init+0x1e/0x190
[ 8.830714] start_kernel+0x672/0x790
[ 8.830714] x86_64_start_reservations+0x18/0x30
[ 8.830714] x86_64_start_kernel+0xd0/0xe0
[ 8.830714] common_startup_64+0x13e/0x148
[ 8.830714] </TASK>
[ 8.830714] CR2: 00000000000000c4
[ 8.830714] ---[ end trace 0000000000000000 ]---
[ 8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[ 8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Kernel panic - not syncing: Attempted to kill the idle task!
[ 8.830714] ---[ end Kernel panic - not syncing: Attempted to kill the idle
task! ]---
Powered by blists - more mailing lists