linux-kernel - Re: [PATCH] x86/kvm: Handle async page faults directly through do_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUG0B2QLYYp8h+5KiZ4LpVDZ00XEPsgh4DbbDX9Mx5-EQ@mail.gmail.com>
Date:   Fri, 28 Feb 2020 11:04:41 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        kvm list <kvm@...r.kernel.org>,
        Radim Krcmar <rkrcmar@...hat.com>
Subject: Re: [PATCH] x86/kvm: Handle async page faults directly through do_page_fault()

On Fri, Feb 28, 2020 at 11:01 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
>
> On 28/02/20 19:42, Andy Lutomirski wrote:
> > KVM overloads #PF to indicate two types of not-actually-page-fault
> > events.  Right now, the KVM guest code intercepts them by modifying
> > the IDT and hooking the #PF vector.  This makes the already fragile
> > fault code even harder to understand, and it also pollutes call
> > traces with async_page_fault and do_async_page_fault for normal page
> > faults.
> >
> > Clean it up by moving the logic into do_page_fault() using a static
> > branch.  This gets rid of the platform trap_init override mechanism
> > completely.
> >
> > Signed-off-by: Andy Lutomirski <luto@...nel.org>
>
> Acked-by: Paolo Bonzini <pbonzini@...hat.com>
>
> Just one thing:
>
> > @@ -1505,6 +1506,25 @@ do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
> >               unsigned long address)
> >  {
> >       prefetchw(&current->mm->mmap_sem);
> > +     /*
> > +      * KVM has two types of events that are, logically, interrupts, but
> > +      * are unfortunately delivered using the #PF vector.
>
> At least the not-present case isn't entirely an interrupt because it
> must be delivered precisely.  Regarding the page-ready case you're
> right, it could be an interrupt. However, generally speaking this is not
> a problem.  Using something in memory rather than overloading the error
> code was the mistake.



>
> > +      * These events are
> > +      * "you just accessed valid memory, but the host doesn't have it right
> > +      * not, so I'll put you to sleep if you continue" and "that memory
> > +      * you tried to access earlier is available now."
> > +      *
> > +      * We are relying on the interrupted context being sane (valid
> > +      * RSP, relevant locks not held, etc.), which is fine as long as
> > +      * the the interrupted context had IF=1.
>
> This is not about IF=0/IF=1; the KVM code is careful about taking
> spinlocks only with IRQs disabled, and async PF is not delivered if the
> interrupted context had IF=0.  The problem is that the memory location
> is not reentrant if an NMI is delivered in the wrong window, as you hint
> below.

If an async PF is delivered with IF=0, then, unless something else
clever happens to make it safe, we are toast.  The x86 entry code
cannot handle #PF (or most other entries) at arbitrary places.  I'll
improve the comment in v2.