[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e36e8baa-7c8b-ca95-95a7-7411599fa0b0@kernel.org>
Date: Mon, 5 Aug 2019 14:26:07 +0100
From: Marc Zyngier <maz@...nel.org>
To: Steven Price <steven.price@....com>
Cc: kvm@...r.kernel.org,
Radim Krčmář <rkrcmar@...hat.com>,
Catalin Marinas <catalin.marinas@....com>,
Suzuki K Pouloze <suzuki.poulose@....com>,
linux-doc@...r.kernel.org, Russell King <linux@...linux.org.uk>,
linux-kernel@...r.kernel.org, James Morse <james.morse@....com>,
linux-arm-kernel@...ts.infradead.org,
Paolo Bonzini <pbonzini@...hat.com>,
Will Deacon <will@...nel.org>, kvmarm@...ts.cs.columbia.edu,
Julien Thierry <julien.thierry.kdev@...il.com>
Subject: Re: [PATCH 0/9] arm64: Stolen time support
On 05/08/2019 14:06, Steven Price wrote:
> On 03/08/2019 19:05, Marc Zyngier wrote:
>> On Fri, 2 Aug 2019 15:50:08 +0100
>> Steven Price <steven.price@....com> wrote:
>>
>> Hi Steven,
>>
>>> This series add support for paravirtualized time for arm64 guests and
>>> KVM hosts following the specification in Arm's document DEN 0057A:
>>>
>>> https://developer.arm.com/docs/den0057/a
>>>
>>> It implements support for stolen time, allowing the guest to
>>> identify time when it is forcibly not executing.
>>>
>>> It doesn't implement support for Live Physical Time (LPT) as there are
>>> some concerns about the overheads and approach in the above
>>> specification, and I expect an updated version of the specification to
>>> be released soon with just the stolen time parts.
>>
>> Thanks for posting this.
>>
>> My current concern with this series is around the fact that we allocate
>> memory from the kernel on behalf of the guest. It is the first example
>> of such thing in the ARM port, and I can't really say I'm fond of it.
>>
>> x86 seems to get away with it by having the memory allocated from
>> userspace, why I tend to like more. Yes, put_user is more
>> expensive than a straight store, but this isn't done too often either.
>>
>> What is the rational for your current approach?
>
> As I see it there are 3 approaches that can be taken here:
>
> 1. Hypervisor allocates memory and adds it to the virtual machine. This
> means that everything to do with the 'device' is encapsulated behind the
> KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the
> stolen time structure to be fast it cannot be a trapping region and has
> to be backed by real memory - in this case allocated by the host kernel.
>
> 2. Host user space allocates memory. Similar to above, but this time
> user space needs to manage the memory region as well as the usual
> KVM_CREATE_DEVICE dance. I've no objection to this, but it means
> kvmtool/QEMU needs to be much more aware of what is going on (e.g. how
> to size the memory region).
>
> 3. Guest kernel "donates" the memory to the hypervisor for the
> structure. As far as I'm aware this is what x86 does. The problems I see
> this approach are:
>
> a) kexec becomes much more tricky - there needs to be a disabling
> mechanism for the guest to stop the hypervisor scribbling on memory
> before starting the new kernel.
>
> b) If there is more than one entity that is interested in the
> information (e.g. firmware and kernel) then this requires some form of
> arbitration in the guest because the hypervisor doesn't want to have to
> track an arbitrary number of regions to update.
>
> c) Performance can suffer if the host kernel doesn't have a suitably
> aligned/sized area to use. As you say - put_user() is more expensive.
> The structure is updated on every return to the VM.
>
>
> Of course x86 does prove the third approach can work, but I'm not sure
> which is actually better. Avoid the kexec cancellation requirements was
> the main driver of the current approach. Although many of the
> conversations about this were also tied up with Live Physical Time which
> adds its own complications.
My current train of thoughts is around (2):
- We don't need a new mechanism to track pages or deal with overlapping
IPA ranges
- We can get rid of the save/restore interface
The drawback is that the amount of memory required per vcpu becomes ABI.
I don't think that's a huge deal, as the hypervisor has the same
contract with the guest.
We also take a small hit with put_user(), but this is only done as a
consequence of vcpu_load() (and not on every entry as you suggest
above). It'd be worth quantifying this overhead before making any
decision one way or another.
Thanks,
M.
--
Jazz is not dead, it just smells funny...
Powered by blists - more mailing lists