[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c80e3380-d484-1b01-a638-0ee130dea11a@redhat.com>
Date: Fri, 28 Feb 2020 20:00:57 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Andy Lutomirski <luto@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
kvm list <kvm@...r.kernel.org>,
Radim Krcmar <rkrcmar@...hat.com>
Subject: Re: [PATCH] x86/kvm: Handle async page faults directly through
do_page_fault()
On 28/02/20 19:42, Andy Lutomirski wrote:
> KVM overloads #PF to indicate two types of not-actually-page-fault
> events. Right now, the KVM guest code intercepts them by modifying
> the IDT and hooking the #PF vector. This makes the already fragile
> fault code even harder to understand, and it also pollutes call
> traces with async_page_fault and do_async_page_fault for normal page
> faults.
>
> Clean it up by moving the logic into do_page_fault() using a static
> branch. This gets rid of the platform trap_init override mechanism
> completely.
>
> Signed-off-by: Andy Lutomirski <luto@...nel.org>
Acked-by: Paolo Bonzini <pbonzini@...hat.com>
Just one thing:
> @@ -1505,6 +1506,25 @@ do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
> unsigned long address)
> {
> prefetchw(¤t->mm->mmap_sem);
> + /*
> + * KVM has two types of events that are, logically, interrupts, but
> + * are unfortunately delivered using the #PF vector.
At least the not-present case isn't entirely an interrupt because it
must be delivered precisely. Regarding the page-ready case you're
right, it could be an interrupt. However, generally speaking this is not
a problem. Using something in memory rather than overloading the error
code was the mistake.
> + * These events are
> + * "you just accessed valid memory, but the host doesn't have it right
> + * not, so I'll put you to sleep if you continue" and "that memory
> + * you tried to access earlier is available now."
> + *
> + * We are relying on the interrupted context being sane (valid
> + * RSP, relevant locks not held, etc.), which is fine as long as
> + * the the interrupted context had IF=1.
This is not about IF=0/IF=1; the KVM code is careful about taking
spinlocks only with IRQs disabled, and async PF is not delivered if the
interrupted context had IF=0. The problem is that the memory location
is not reentrant if an NMI is delivered in the wrong window, as you hint
below.
Paolo
> We are also relying on
> + * the KVM async pf type field and CR2 being read consistently
> + * instead of getting values from real and async page faults
> + * mixed up.
> + *
> + * Fingers crossed.
> + */
> + if (kvm_handle_async_pf(regs, hw_error_code, address))
> + return;
> +
> trace_page_fault_entries(regs, hw_error_code, address);
>
> if (unlikely(kmmio_fault(regs, address)))
>
Powered by blists - more mailing lists