[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250813192313.132431-1-mlevitsk@redhat.com>
Date: Wed, 13 Aug 2025 15:23:10 -0400
From: Maxim Levitsky <mlevitsk@...hat.com>
To: kvm@...r.kernel.org
Cc: Sean Christopherson <seanjc@...gle.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Paolo Bonzini <pbonzini@...hat.com>,
x86@...nel.org,
Borislav Petkov <bp@...en8.de>,
linux-kernel@...r.kernel.org,
Maxim Levitsky <mlevitsk@...hat.com>
Subject: [PATCH 0/3] Fix a lost async pagefault notification when the guest is using SMM
Recently we debugged a customer case in which the guest VM was showing
tasks permanently stuck in the kvm_async_pf_task_wait_schedule.
This was traced to the incorrect flushing of the async pagefault queue,
which was done during the real mode entry by the kvm_post_set_cr0.
This code, the kvm_clear_async_pf_completion_queue does wait for all #APF
tasks to complete but then it proceeds to wipe the 'done' queue without
notifying the guest.
Such approach is acceptable if the guest is being rebooted or if
it decided to disable APF, but it leads to failures if the entry to real
mode was caused by SMM, because in this case the guest intends to continue
using APF after returning from the SMM handler.
Amusingly, and on top of this, the SMM entry code doesn't call
the kvm_set_cr0 (and subsequently neither it calls kvm_post_set_cr0),
but rather only the SMM mode exit code does.
During SMM entry, the SMM code calls .set_cr0 instead, with an intention
to bypass various architectural checks that can otherwise fail.
One example of such check is a #GP check on an attempt to disable paging
while the long mode is active.
To do this, the user must first exit to the compatibility mode and only then
disable paging.
The question of the possiblity of eliminating this bypass, is a side topic
that is probably worth discussing separately.
Back to the topic, the kvm_set_cr0 is still called during SMM handling,
more particularly during the exit from SMM, by emulator_leave_smm:
It is called once with CR0.PE == off, to setup a baseline real-mode
environment, and then a second time, with the original CR0 value.
Even more amusingly, usually both mentioned calls result in APF queue being
flushed, because the code in kvm_post_set_cr0 doesn't distinguish between
entry and exit from protected mode, and SMM mode usually enables protection
and paging, and exits itself without bothering first to exit back to
the real mode.
To fix this problem, I think the best solution is to drop the call to
kvm_clear_async_pf_completion_queue in kvm_post_set_cr0 code altogether,
and instead raise the KVM_REQ_APF_READY, when the protected mode
is re-established.
Existing APF requests should have no problem to complete while the guest is
in SMM and the APF completion event injection should work too,
because SMM handler *ought* to not enable interrupts because otherwise
things would go south very quickly.
This change also brings the logic to be up to date with logic that KVM
follows when the guest disables APIC.
KVM also raises KVM_REQ_APF_READY when the APIC is re-enabled.
In addition to this, I also included few fixes for few semi-theortical
bugs I found while debugging this.
Best regards,
Maxim Levitsky
Maxim Levitsky (3):
KVM: x86: Warn if KVM tries to deliver an #APF completion when APF is
not enabled
KVM: x86: Fix a semi theoretical bug in
kvm_arch_async_page_present_queued
KVM: x86: Fix the interaction between SMM and the asynchronous
pagefault
arch/x86/kvm/x86.c | 22 +++++++++++++++-------
arch/x86/kvm/x86.h | 1 +
2 files changed, 16 insertions(+), 7 deletions(-)
--
2.49.0
Powered by blists - more mailing lists