[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYW5CbUvZrLogsWF@yzhao56-desk.sh.intel.com>
Date: Fri, 6 Feb 2026 17:48:57 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
<x86@...nel.org>, Kiryl Shutsemau <kas@...nel.org>, Paolo Bonzini
<pbonzini@...hat.com>, <linux-kernel@...r.kernel.org>,
<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>, Kai Huang
<kai.huang@...el.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>, "Vishal
Annapurve" <vannapurve@...gle.com>, Ackerley Tng <ackerleytng@...gle.com>,
Sagi Shahar <sagis@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>,
Xiaoyao Li <xiaoyao.li@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages
using tdx_{alloc,free}_control_page()
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 18764dbc97ea..01e3e4f4baa5 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -55,7 +55,8 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>
> static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> {
> - free_page((unsigned long)sp->external_spt);
> + if (sp->external_spt)
> + kvm_x86_call(free_external_sp)((unsigned long)sp->external_spt);
> free_page((unsigned long)sp->spt);
> kmem_cache_free(mmu_page_header_cache, sp);
> }
Strictly speaking, external_spt is not a control page. Its alloc/free are
different from normal control pages managed by TDX's code.
(1) alloc
tdx_alloc_control_page
__tdx_alloc_control_page
__tdx_pamt_get
spin_lock(&pamt_lock) ==> under process context
spin_unlock(&pamt_lock)
(2) free
tdp_mmu_free_sp_rcu_callback
tdp_mmu_free_sp
kvm_x86_call(free_external_sp)
tdx_free_control_page
__tdx_free_control_page
__tdx_pamt_put
spin_lock(&pamt_lock) ==> under softirq context
spin_unlock(&pamt_lock)
So, invoking __tdx_pamt_put() in the RCU callback triggers deadlock warning
(see the bottom for details).
> + /*
> + * TDX uses the external_spt cache to allocate S-EPT page table pages,
> + * which (a) don't need to be initialized by KVM as the TDX-Module will
> + * initialize the page (using the guest's encryption key), and (b) need
> + * to use a custom allocator to be compatible with Dynamic PAMT.
> + */
> + vt_x86_ops.alloc_external_sp = tdx_alloc_control_page;
> + vt_x86_ops.free_external_sp = tdx_free_control_page;
> +
> vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
> vt_x86_ops.reclaim_external_sp = tdx_sept_reclaim_private_sp;
> vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
================================
WARNING: inconsistent lock state
6.19.0-rc6-upstream+ #1078 Tainted: G S U
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffffffff9067b6f8 (pamt_lock){+.?.}-{3:3}, at: __tdx_pamt_put+0x80/0xf0
{SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x405/0xc10
lock_acquire.part.0+0x9c/0x210
lock_acquire+0x5e/0x100
_raw_spin_lock+0x37/0x80
__tdx_pamt_get+0xb8/0x150
__tdx_alloc_control_page+0x2e/0x60
__tdx_td_init+0x65/0x740 [kvm_intel]
tdx_td_init+0x147/0x240 [kvm_intel]
tdx_vm_ioctl+0x125/0x260 [kvm_intel]
vt_mem_enc_ioctl+0x17/0x30 [kvm_intel]
kvm_arch_vm_ioctl+0x4e0/0xb40 [kvm]
kvm_vm_ioctl+0x4f4/0xaf0 [kvm]
__x64_sys_ioctl+0x9d/0xf0
x64_sys_call+0xf38/0x1da0
do_syscall_64+0xc5/0xfc0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
irq event stamp: 252814
hardirqs last enabled at (252814): [<ffffffff8fa6f41a>] _raw_spin_unlock_irqrestore+0x5a/0x80
hardirqs last disabled at (252813): [<ffffffff8fa6f096>] _raw_spin_lock_irqsave+0x76/0x90
softirqs last enabled at (252798): [<ffffffff8e60f139>] handle_softirqs+0x309/0x460
softirqs last disabled at (252805): [<ffffffff8e60f401>] __irq_exit_rcu+0xe1/0x160
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pamt_lock);
<Interrupt>
lock(pamt_lock);
*** DEADLOCK ***
1 lock held by swapper/7/0:
#0: ffffffff9077d660 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x153/0x620
stack backtrace:
CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G S U 6.19.0-rc6-upstream+ #1078 PREEMPT(voluntary) b8f4b38003dc2ca73352cf9d3d544aa826c4f5a9
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0101.D29.2303301937 03/30/2023
Call Trace:
<IRQ>
show_stack+0x49/0x60
dump_stack_lvl+0x6f/0xb0
dump_stack+0x10/0x16
print_usage_bug.part.0+0x264/0x350
mark_lock_irq+0x4d6/0x9e0
? stack_trace_save+0x4a/0x70
? save_trace+0x66/0x2b0
mark_lock+0x1cf/0x6a0
mark_usage+0x4c/0x130
__lock_acquire+0x405/0xc10
? __this_cpu_preempt_check+0x13/0x20
lock_acquire.part.0+0x9c/0x210
? __tdx_pamt_put+0x80/0xf0
lock_acquire+0x5e/0x100
? __tdx_pamt_put+0x80/0xf0
_raw_spin_lock+0x37/0x80
? __tdx_pamt_put+0x80/0xf0
__tdx_pamt_put+0x80/0xf0
? __this_cpu_preempt_check+0x13/0x20
? sched_clock_noinstr+0x9/0x10
__tdx_free_control_page+0x22/0x40
tdx_free_control_page+0x38/0x50 [kvm_intel c135d3571385e160f086f9f6195fc72e4b6aa2b1]
tdp_mmu_free_sp_rcu_callback+0x24/0x50 [kvm 3932b137c28c130169e7e3615041bcec6cefc090]
? rcu_do_batch+0x1dc/0x620
rcu_do_batch+0x1e1/0x620
? rcu_do_batch+0x153/0x620
rcu_core+0x37d/0x4d0
rcu_core_si+0xe/0x20
handle_softirqs+0xdc/0x460
? hrtimer_interrupt+0x154/0x290
__irq_exit_rcu+0xe1/0x160
irq_exit_rcu+0xe/0x30
sysvec_apic_timer_interrupt+0xc0/0xf0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:cpuidle_enter_state+0x122/0x7a0
Powered by blists - more mailing lists