[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ca5c106-7c12-4c6e-6d81-a90f281a9894@amazon.com>
Date: Wed, 14 Aug 2019 15:02:25 +0200
From: Alexander Graf <graf@...zon.com>
To: Steven Price <steven.price@....com>, Marc Zyngier <maz@...nel.org>
CC: <kvm@...r.kernel.org>, Catalin Marinas <catalin.marinas@....com>,
<linux-doc@...r.kernel.org>, Russell King <linux@...linux.org.uk>,
<linux-kernel@...r.kernel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
"Will Deacon" <will@...nel.org>, <kvmarm@...ts.cs.columbia.edu>,
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 0/9] arm64: Stolen time support
On 05.08.19 15:06, Steven Price wrote:
> On 03/08/2019 19:05, Marc Zyngier wrote:
>> On Fri, 2 Aug 2019 15:50:08 +0100
>> Steven Price <steven.price@....com> wrote:
>>
>> Hi Steven,
>>
>>> This series add support for paravirtualized time for arm64 guests and
>>> KVM hosts following the specification in Arm's document DEN 0057A:
>>>
>>> https://developer.arm.com/docs/den0057/a
>>>
>>> It implements support for stolen time, allowing the guest to
>>> identify time when it is forcibly not executing.
>>>
>>> It doesn't implement support for Live Physical Time (LPT) as there are
>>> some concerns about the overheads and approach in the above
>>> specification, and I expect an updated version of the specification to
>>> be released soon with just the stolen time parts.
>>
>> Thanks for posting this.
>>
>> My current concern with this series is around the fact that we allocate
>> memory from the kernel on behalf of the guest. It is the first example
>> of such thing in the ARM port, and I can't really say I'm fond of it.
>>
>> x86 seems to get away with it by having the memory allocated from
>> userspace, why I tend to like more. Yes, put_user is more
>> expensive than a straight store, but this isn't done too often either.
>>
>> What is the rational for your current approach?
>
> As I see it there are 3 approaches that can be taken here:
>
> 1. Hypervisor allocates memory and adds it to the virtual machine. This
> means that everything to do with the 'device' is encapsulated behind the
> KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the
> stolen time structure to be fast it cannot be a trapping region and has
> to be backed by real memory - in this case allocated by the host kernel.
>
> 2. Host user space allocates memory. Similar to above, but this time
> user space needs to manage the memory region as well as the usual
> KVM_CREATE_DEVICE dance. I've no objection to this, but it means
> kvmtool/QEMU needs to be much more aware of what is going on (e.g. how
> to size the memory region).
You ideally want to get the host overhead for a VM to as little as you
can. I'm not terribly fond of the idea of reserving a full page just
because we're too afraid of having the guest donate memory.
>
> 3. Guest kernel "donates" the memory to the hypervisor for the
> structure. As far as I'm aware this is what x86 does. The problems I see
> this approach are:
>
> a) kexec becomes much more tricky - there needs to be a disabling
> mechanism for the guest to stop the hypervisor scribbling on memory
> before starting the new kernel.
I wouldn't call "quiesce a device" much more tricky. We have to do that
for other devices as well today.
> b) If there is more than one entity that is interested in the
> information (e.g. firmware and kernel) then this requires some form of
> arbitration in the guest because the hypervisor doesn't want to have to
> track an arbitrary number of regions to update.
Why would FW care?
> c) Performance can suffer if the host kernel doesn't have a suitably
> aligned/sized area to use. As you say - put_user() is more expensive.
Just define the interface to always require natural alignment when
donating a memory location?
> The structure is updated on every return to the VM.
If you really do suffer from put_user(), there are alternatives. You
could just map the page on the registration hcall and then leave it
pinned until the vcpu gets destroyed again.
> Of course x86 does prove the third approach can work, but I'm not sure
> which is actually better. Avoid the kexec cancellation requirements was
> the main driver of the current approach. Although many of the
I really don't understand the problem with kexec cancellation. Worst
case, let guest FW set it up for you and propagate only the address down
via ACPI/DT. That way you can mark the respective memory as reserved too.
But even with a Linux only mechanism, just take a look at
arch/x86/kernel/kvmclock.c. All they do to remove the map is to hook
into machine_crash_shutdown() and machine_shutdown().
Alex
> conversations about this were also tied up with Live Physical Time which
> adds its own complications.
>
> Steve
> _______________________________________________
> kvmarm mailing list
> kvmarm@...ts.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>
Powered by blists - more mailing lists