linux-kernel - Re: [PATCH 4/5] KVM: add __kvm_request_needs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170223153917.GB3856@potion>
Date:   Thu, 23 Feb 2017 16:39:18 +0100
From:   Radim Krčmář <rkrcmar@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Christian Borntraeger <borntraeger@...ibm.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Andrew Jones <drjones@...hat.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Cornelia Huck <cornelia.huck@...ibm.com>,
        James Hogan <james.hogan@...tec.com>,
        Paul Mackerras <paulus@...abs.org>,
        Christoffer Dall <christoffer.dall@...aro.org>
Subject: Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb

2017-02-23 11:20+0100, David Hildenbrand:
> Am 22.02.2017 um 20:57 schrieb Christian Borntraeger:
>> On 02/22/2017 04:17 PM, Radim Krčmář wrote:
>>>
>> [...] 
>>>    	while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE)
>>>   		cpu_relax();
>>>    }
>> 
>>> And out of curiosity -- how many cycles does this loop usually take?
>> 
>> A quick hack indicates something between 3 and 700ns.
>> 
>>>> 2. Remote requests that don't need a sync
>>>>
>>>> E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while
>>>> KVM_REQ_DISABLE_IBS does.
>>>
>>> A usual KVM request would kick the VCPU out of nested virt as well.
>>> Shouldn't it be done for these as well?
>> 
>> A common code function probably should. For some of the cases (again
>> prefix page handling) we do not need it. For example if we unmap
>> the guest prefix page, but guest^2 is running this causes no trouble
>> as long as we handle the request before reentering guest^1. So
>> not an easy answer.
> 
> The problem is, that doing synchronous requests on two sie control
> blocks (vsie and "ordinary") is really really hard to implement. I had a
> prototype, but it was just ugly. And there was no reason to do it,
> because all current requests can live with nested guest being executed
> (as it is really all just a matter of coordinating SIE control block
> changes for our guest, not involving nested guests).
> 
> Also note, that VSIE uses these special request bits in the SIE control
> block for its own purposes (to catch unmappings of the prefix page, but
> this time on the nested address space). We don't want to replace this by
> an ordinary request bit (because then we have to exit the VSIE loop much
> more often).
> 
> I think the VSIE code should not care too much about request bits until
> there is really a need for it (meaning: VCPU requests that cannot
> tolerate the VSIE running).

Sure, request-based changes to L1 can be delayed if they cannot affect
L2.

> Kicking the VSIE from time to time cannot harm. But there are no
> guarantees about requests.
> 
>> 
>>>
>>>> 3. local requests
>>>>
>>>> E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix()
>>>>
>>>>
>>>> Of course, having a unified interface would be better.
>>>>
>>>> /* set the request and kick the CPU out of guest mode */
>>>> kvm_set_request(req, vcpu);
>>>>
>>>> /* set the request, kick the CPU out of guest mode, wait until guest
>>>> mode has been left and make sure the request will be handled before
>>>> reentering guest mode */
>>>> kvm_set_sync_request(req, vcpu);
>>>
>>> Sounds good, I'll also add
>>>
>>>   kvm_set_self_request(req, vcpu);
> 
> or introduce some lightweight check (e.g. against (vcpu->cpu))
> internally, that simply skips kicking/other stuff in case the vcpu is
> currently loaded.

Sounds interesting, I'll benchmark it on x86.  I was planning on adding
a check inside CONFIG_DEBUG WARN_ON_ONCE, but it might be cheap enough
and definitely simplifies the API.

The check is already performed in kvm_vcpu_kick() and I hope that it
won't need a big refactoring to avoid copy-paste.

>>>> Same maybe even for multiple VCPUs (as there are then ways to speed it
>>>> up, e.g. first kick all, then wait for all)
>>>>
>>>> This would require arch specific callbacks to
>>>> 1. pre announce the request (e.g. set PROG_REQUEST on s390x)
>>>> 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later
>>>> kvm_s390_vsie_kick(vcpu) on s390x)
>>>> 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x)
>>>>
>>>> This would only make sense if there are other use cases for sync
>>>> requests. At least I remember that Power also has a faster way for
>>>> kicking VCPUs, not involving SMP rescheds. I can't judge if this is a
>>>> s390x only thing and is better be left as is :)
>>>>
>>>> At least vcpu_kick() could be quite easily made to work on s390x.
>>>>
>>>> Radim, are there also other users that need something like sync requests?
>>>
>>> I think that ARM has a similar need when updating vgic, but relies on an
>>> asumption that VCPUs are going to be out after kicking them with
>>> kvm_make_all_cpus_request().
>>> (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c)
>>>
>>> Having synchronous requests in a common API should probably wait for the
>>> completion of the request, not just for the kick, which would make race
>>> handling simpler.
>> 
>> This would be problematic for our prefix page handling due to locking.
>> 
> 
> And if I am not wrong, waiting for a request to be handled might take
> forever (thinking about VCPUs in user space - maybe even stopped ones
> that won't run again).

I agree.

> I think the general notion of synchronous VCPU requests should be: Make
> sure that VCPU does not go into guest mode before handling the request.
> This includes waiting for it to exit guest mode.

I'll keep synchronous requests out for the moment.  There is nothing
similar in other architectures and the idea is very specific even in
s390.