linux-kernel - Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2701ce67-bfff-8c0c-4450-7c4a281419de@redhat.com>
Date:   Mon, 24 Oct 2022 10:33:18 +0200
From:   Emanuele Giuseppe Esposito <eesposit@...hat.com>
To:     Christian Borntraeger <borntraeger@...ux.ibm.com>,
        kvm@...r.kernel.org
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Maxim Levitsky <mlevitsk@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a
 vm



Am 24/10/2022 um 09:56 schrieb Christian Borntraeger:
> Am 22.10.22 um 17:48 schrieb Emanuele Giuseppe Esposito:
>> This new API allows the userspace to stop all running
>> vcpus using KVM_KICK_ALL_RUNNING_VCPUS ioctl, and resume them with
>> KVM_RESUME_ALL_KICKED_VCPUS.
>> A "running" vcpu is a vcpu that is executing the KVM_RUN ioctl.
>>
>> This serie is especially helpful to userspace hypervisors like
>> QEMU when they need to perform operations on memslots without the
>> risk of having a vcpu reading them in the meanwhile.
>> With "memslots operations" we mean grow, shrink, merge and split
>> memslots, which are not "atomic" because there is a time window
>> between the DELETE memslot operation and the CREATE one.
>> Currently, each memslot operation is performed with one or more
>> ioctls.
>> For example, merging two memslots into one would imply:
>> DELETE(m1)
>> DELETE(m2)
>> CREATE(m1+m2)
>>
>> And a vcpu could attempt to read m2 right after it is deleted, but
>> before the new one is created.
>>
>> Therefore the simplest solution is to pause all vcpus in the kvm
>> side, so that:
>> - userspace just needs to call the new API before making memslots
>> changes, keeping modifications to the minimum
>> - dirty page updates are also performed when vcpus are blocked, so
>> there is no time window between the dirty page ioctl and memslots
>> modifications, since vcpus are all stopped.
>> - no need to modify the existing memslots API
> Isnt QEMU able to achieve the same goal today by forcing all vCPUs
> into userspace with a signal? Can you provide some rationale why this
> is better in the cover letter or patch description?
> 
David Hildenbrand tried to propose something similar here:
https://github.com/davidhildenbrand/qemu/commit/86b1bf546a8d00908e33f7362b0b61e2be8dbb7a

While it is not optimized, I think it's more complex that the current
serie, since qemu should also make sure all running ioctls finish and
prevent the new ones from getting executed.

Also we can't use pause_all_vcpus()/resume_all_vcpus() because they drop
the BQL.

Would that be ok as rationale?

Thank you,
Emanuele