linux-kernel - Re: [PATCH 0/9] arm64: Stolen time support

Open Source and information security mailing list archives

Message-ID: <8ca5c106-7c12-4c6e-6d81-a90f281a9894@amazon.com>
Date:   Wed, 14 Aug 2019 15:02:25 +0200
From:   Alexander Graf <graf@...zon.com>
To:     Steven Price <steven.price@....com>, Marc Zyngier <maz@...nel.org>
CC:     <kvm@...r.kernel.org>, Catalin Marinas <catalin.marinas@....com>,
        <linux-doc@...r.kernel.org>, Russell King <linux@...linux.org.uk>,
        <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Will Deacon" <will@...nel.org>, <kvmarm@...ts.cs.columbia.edu>,
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 0/9] arm64: Stolen time support

On 05.08.19 15:06, Steven Price wrote:
> On 03/08/2019 19:05, Marc Zyngier wrote:
>> On Fri,  2 Aug 2019 15:50:08 +0100
>> Steven Price <steven.price@....com> wrote:
>>
>> Hi Steven,
>>
>>> This series add support for paravirtualized time for arm64 guests and
>>> KVM hosts following the specification in Arm's document DEN 0057A:
>>>
>>> https://developer.arm.com/docs/den0057/a
>>>
>>> It implements support for stolen time, allowing the guest to
>>> identify time when it is forcibly not executing.
>>>
>>> It doesn't implement support for Live Physical Time (LPT) as there are
>>> some concerns about the overheads and approach in the above
>>> specification, and I expect an updated version of the specification to
>>> be released soon with just the stolen time parts.
>>
>> Thanks for posting this.
>>
>> My current concern with this series is around the fact that we allocate
>> memory from the kernel on behalf of the guest. It is the first example
>> of such thing in the ARM port, and I can't really say I'm fond of it.
>>
>> x86 seems to get away with it by having the memory allocated from
>> userspace, why I tend to like more. Yes, put_user is more
>> expensive than a straight store, but this isn't done too often either.
>>
>> What is the rational for your current approach?
> 
> As I see it there are 3 approaches that can be taken here:
> 
> 1. Hypervisor allocates memory and adds it to the virtual machine. This
> means that everything to do with the 'device' is encapsulated behind the
> KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the
> stolen time structure to be fast it cannot be a trapping region and has
> to be backed by real memory - in this case allocated by the host kernel.
> 
> 2. Host user space allocates memory. Similar to above, but this time
> user space needs to manage the memory region as well as the usual
> KVM_CREATE_DEVICE dance. I've no objection to this, but it means
> kvmtool/QEMU needs to be much more aware of what is going on (e.g. how
> to size the memory region).

You ideally want to get the host overhead for a VM to as little as you 
can. I'm not terribly fond of the idea of reserving a full page just 
because we're too afraid of having the guest donate memory.

> 
> 3. Guest kernel "donates" the memory to the hypervisor for the
> structure. As far as I'm aware this is what x86 does. The problems I see
> this approach are:
> 
>   a) kexec becomes much more tricky - there needs to be a disabling
> mechanism for the guest to stop the hypervisor scribbling on memory
> before starting the new kernel.

I wouldn't call "quiesce a device" much more tricky. We have to do that 
for other devices as well today.

>   b) If there is more than one entity that is interested in the
> information (e.g. firmware and kernel) then this requires some form of
> arbitration in the guest because the hypervisor doesn't want to have to
> track an arbitrary number of regions to update.

Why would FW care?

>   c) Performance can suffer if the host kernel doesn't have a suitably
> aligned/sized area to use. As you say - put_user() is more expensive.

Just define the interface to always require natural alignment when 
donating a memory location?

> The structure is updated on every return to the VM.

If you really do suffer from put_user(), there are alternatives. You 
could just map the page on the registration hcall and then leave it 
pinned until the vcpu gets destroyed again.

> Of course x86 does prove the third approach can work, but I'm not sure
> which is actually better. Avoid the kexec cancellation requirements was
> the main driver of the current approach. Although many of the

I really don't understand the problem with kexec cancellation. Worst 
case, let guest FW set it up for you and propagate only the address down 
via ACPI/DT. That way you can mark the respective memory as reserved too.

But even with a Linux only mechanism, just take a look at 
arch/x86/kernel/kvmclock.c. All they do to remove the map is to hook 
into machine_crash_shutdown() and machine_shutdown().

Alex

> conversations about this were also tied up with Live Physical Time which
> adds its own complications.
> 
> Steve
> _______________________________________________
> kvmarm mailing list
> kvmarm@...ts.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives