[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e16b3bf2-ad1e-4614-a134-cee44a3b1b8b@oracle.com>
Date: Wed, 4 Jun 2025 12:15:10 +0100
From: Liam Merwick <liam.merwick@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Alexander Potapenko <glider@...gle.com>,
James Houghton <jthoughton@...gle.com>,
Peter Gonda <pgonda@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [PATCH 1/2] KVM: SVM: Reject SEV{-ES} intra host migration if
vCPU creation is in-flight
On 02/06/2025 23:44, Sean Christopherson wrote:
> Reject migration of SEV{-ES} state if either the source or destination VM
> is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
> section between incrementing created_vcpus and online_vcpus. The bulk of
> vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
> in parallel, and so sev_info.es_active can get toggled from false=>true in
> the destination VM after (or during) svm_vcpu_create(), resulting in an
> SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.
>
> The issue manifests most visibly as a crash when trying to free a vCPU's
> NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.
>
> BUG: unable to handle page fault for address: ffffebde00000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP KASAN NOPTI
> CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE
> Tainted: [U]=USER, [O]=OOT_MODULE
> Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
> RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
> RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
> RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
> RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
> RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
> Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
> RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
> RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
> RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
> R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
> R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
> FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
> DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
> svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
> kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
> kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
> kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
> kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
> kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
> kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
> __fput+0x3e4/0x9e0 fs/file_table.c:465
> task_work_run+0x1a9/0x220 kernel/task_work.c:227
> exit_task_work include/linux/task_work.h:40 [inline]
> do_exit+0x7f0/0x25b0 kernel/exit.c:953
> do_group_exit+0x203/0x2d0 kernel/exit.c:1102
> get_signal+0x1357/0x1480 kernel/signal.c:3034
> arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
> do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f87a898e969
> </TASK>
> Modules linked in: gq(O)
> gsmi: Log Shutdown Reason 0x03
> CR2: ffffebde00000000
> ---[ end trace 0000000000000000 ]---
>
> Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
> the host is likely desirable due to the VMSA being consumed by hardware.
> E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
> bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered
> away thanks to L1TF, but panicking in this scenario is preferable to
> potentially running with corrupted state.
>
> Reported-by: Alexander Potapenko <glider@...gle.com>
> Tested-by: Alexander Potapenko <glider@...gle.com>
> Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration")
> Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
> Cc: stable@...r.kernel.org
> Cc: James Houghton <jthoughton@...gle.com>
> Cc: Peter Gonda <pgonda@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
Reviewed-by: Liam Merwick <liam.merwick@...cle.com>
Tested-by: Liam Merwick <liam.merwick@...cle.com>
> ---
> arch/x86/kvm/svm/sev.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a7a7dc507336..93d899454535 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2032,6 +2032,10 @@ static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src)
> struct kvm_vcpu *src_vcpu;
> unsigned long i;
>
> + if (src->created_vcpus != atomic_read(&src->online_vcpus) ||
> + dst->created_vcpus != atomic_read(&dst->online_vcpus))
> + return -EINVAL;
> +
> if (!sev_es_guest(src))
> return 0;
>
Powered by blists - more mailing lists