linux-kernel - Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200630152529.GC322149@redhat.com>
Date:   Tue, 30 Jun 2020 11:25:29 -0400
From:   Vivek Goyal <vgoyal@...hat.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     kvm@...r.kernel.org, virtio-fs@...hat.com, pbonzini@...hat.com,
        sean.j.christopherson@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault
 error

On Tue, Jun 30, 2020 at 05:13:54PM +0200, Vitaly Kuznetsov wrote:
> Vivek Goyal <vgoyal@...hat.com> writes:
> 
> > On Tue, Jun 30, 2020 at 03:24:43PM +0200, Vitaly Kuznetsov wrote:
> 
> >> 
> >> It's probably me who's missing something important here :-) but I think
> >> you describe how it *should* work as I'm not seeing how we can leave the
> >> loop in kvm_async_pf_task_wait_schedule() other than by 
> >> "if (hlist_unhashed(&n.link)) break;" and this only happens when APF
> >> completes.
> >
> > We don't leave loop in kvm_async_pf_task_wait_schedule(). It will happen
> > before you return to user space.
> >
> > I have not looked too closely but I think following code path might be taken
> > after aync PF has completed.
> >
> > __kvm_handle_async_pf()
> >   idtentry_exit_cond_rcu()
> >     prepare_exit_to_usermode()
> >       __prepare_exit_to_usermode()
> >         exit_to_usermode_loop()
> > 	  do_signal()
> >
> > So once you have been woken up (because APF completed),
> 
> Ah, OK so we still need to complete APF and we can't kill the process
> before this happens, that's what I was missing.
> 
> >  you will
> > return to user space and before that you will check if there are
> > pending signals and handle that signal first before user space
> > gets a chance to run again and retry faulting instruction.
> 
> ...
> 
> >
> >> 
> >> When guest receives the 'page ready' event with an error it (like for
> >> every other 'page ready' event) tries to wake up the corresponding
> >> process but if the process is dead already it can do in-kernel probing
> >> of the GFN, this way we guarantee that the error is always injected. I'm
> >> not sure if it is needed though but in case it is, this can be a
> >> solution. We can add a new feature bit and only deliver errors when the
> >> guest indicates that it knows what to do with them.
> >
> > - Process will be delivered singal after async PF completion and during
> >   returning to user space. You have lost control by then.
> >
> 
> So actually there's no way for kernel to know if the userspace process
> managed to re-try the instruction and get the error injected or if it
> was killed prior to that.

Yes. 

> 
> > - If you retry in kernel, we will change the context completely that
> >   who was trying to access the gfn in question. We want to retain
> >   the real context and retain information who was trying to access
> >   gfn in question.
> 
> (Just so I understand the idea better) does the guest context matter to
> the host? Or, more specifically, are we going to do anything besides
> get_user_pages() which will actually analyze who triggered the access
> *in the guest*?

When we exit to user space, qemu prints bunch of register state. I am
wondering what does that state represent. Does some of that traces
back to the process which was trying to access that hva? I don't
know.

I think keeping a cache of error gfns might not be too bad from
implemetation point of view. I will give it a try and see how
bad does it look.

Vivek