linux-kernel - Re: [PATCH 4/5] KVM: add __kvm_request_needs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170222151704.GA3856@potion>
Date:   Wed, 22 Feb 2017 16:17:05 +0100
From:   Radim Krčmář <rkrcmar@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Christian Borntraeger <borntraeger@...ibm.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Andrew Jones <drjones@...hat.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Cornelia Huck <cornelia.huck@...ibm.com>,
        James Hogan <james.hogan@...tec.com>,
        Paul Mackerras <paulus@...abs.org>,
        Christoffer Dall <christoffer.dall@...aro.org>
Subject: Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb

[Oops, the end of this thread got dragged into a mark-as-read spree ...]

2017-02-17 11:13+0100, David Hildenbrand:
>>> This is really complicated stuff, and the basic reason for it (if I
>>> remember correctly) is that s390x does reenable all interrupts when
>>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based
>>> kicks don't work (as it is otherwise just racy), and if I remember
>>> correctly, SMP reschedule signals (s390x external calls) would be
>>> slower. (Christian, please correct me if I'm wrong)
>> 
>> No the reason was that there are some requests that need to be handled
>> outside run SIE. For example one reason was the guest prefix page.
>> This must be mapped read/write ALL THE TIME when a guest is running,
>> otherwise the host might crash. So we have to exit SIE and make sure that
>> it does not reenter, therefore we use the RELOAD_MMU request from a notifier
>> that is called from page table functions, whenever memory management decides
>> to unmap/write protect (dirty pages tracking, reference tracking, page migration
>> or compaction...)
>> 
>> SMP-based request wills kick out the guest, but for some thing like the
>> one above it will be too late.
> 
> While what you said is 100% correct, I had something else in mind that
> hindered using vcpu_kick() and especially kvm_make_all_cpus_request().
> And I remember that being related to how preemption and
> OUTSIDE_GUEST_MODE is handled. I think this boils down to what would
> have to be implemented in kvm_arch_vcpu_should_kick().
> 
> x86 can track the guest state using vcpu->mode, because they can be sure
> that the guest can't reschedule while in the critical guest entry/exit
> section. This is not true for s390x, as preemption is enabled. That's
> why vcpu->mode cannot be used in its current form to track if a VCPU is
> in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently
> relies on this setting.
> 
> For now, calling vcpu_kick() on s390x will result in a BUG().
> 
> 
> On s390x, there are 3 use cases I see for requests:
> 
> 1. Remote requests that need a sync
> 
> Make a request, wait until SIE has been left and make sure the request
> will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU
> notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a
> candidate.

Btw. aren't those requests racy?

    void exit_sie(struct kvm_vcpu *vcpu)
    {
    	atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);

If you get stalled here and the target VCPU handles the request and
reenters SIE in the meantime, then you'll wait until its next exit.
(And miss an unbounded amount of exits in the worst case.)

    	while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE)
    		cpu_relax();
    }

And out of curiosity -- how many cycles does this loop usually take?

> 2. Remote requests that don't need a sync
> 
> E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while
> KVM_REQ_DISABLE_IBS does.

A usual KVM request would kick the VCPU out of nested virt as well.
Shouldn't it be done for these as well?

> 3. local requests
> 
> E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix()
> 
> 
> Of course, having a unified interface would be better.
> 
> /* set the request and kick the CPU out of guest mode */
> kvm_set_request(req, vcpu);
> 
> /* set the request, kick the CPU out of guest mode, wait until guest
> mode has been left and make sure the request will be handled before
> reentering guest mode */
> kvm_set_sync_request(req, vcpu);

Sounds good, I'll also add

  kvm_set_self_request(req, vcpu);

> Same maybe even for multiple VCPUs (as there are then ways to speed it
> up, e.g. first kick all, then wait for all)
> 
> This would require arch specific callbacks to
> 1. pre announce the request (e.g. set PROG_REQUEST on s390x)
> 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later
> kvm_s390_vsie_kick(vcpu) on s390x)
> 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x)
> 
> This would only make sense if there are other use cases for sync
> requests. At least I remember that Power also has a faster way for
> kicking VCPUs, not involving SMP rescheds. I can't judge if this is a
> s390x only thing and is better be left as is :)
> 
> At least vcpu_kick() could be quite easily made to work on s390x.
> 
> Radim, are there also other users that need something like sync requests?

I think that ARM has a similar need when updating vgic, but relies on an
asumption that VCPUs are going to be out after kicking them with
kvm_make_all_cpus_request().
(vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c)

Having synchronous requests in a common API should probably wait for the
completion of the request, not just for the kick, which would make race
handling simpler.

I'm not going to worry about them in this pass, though.

Thanks.