linux-kernel - Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33d315e5-6c17-02ff-abcc-17f11c2ce883@arm.com>
Date:   Thu, 29 Aug 2019 16:21:28 +0100
From:   Steven Price <steven.price@....com>
To:     Christoffer Dall <christoffer.dall@....com>
Cc:     kvm@...r.kernel.org, linux-doc@...r.kernel.org,
        Marc Zyngier <maz@...nel.org>, linux-kernel@...r.kernel.org,
        Russell King <linux@...linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Will Deacon <will@...nel.org>, kvmarm@...ts.cs.columbia.edu,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface

On 28/08/2019 14:49, Christoffer Dall wrote:
> On Tue, Aug 27, 2019 at 10:57:06AM +0200, Christoffer Dall wrote:
>> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>>> Introduce a paravirtualization interface for KVM/arm64 based on the
>>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>>
>>> This only adds the details about "Stolen Time" as the details of "Live
>>> Physical Time" have not been fully agreed.
>>>
>>> User space can specify a reserved area of memory for the guest and
>>> inform KVM to populate the memory with information on time that the host
>>> kernel has stolen from the guest.
>>>
>>> A hypercall interface is provided for the guest to interrogate the
>>> hypervisor's support for this interface and the location of the shared
>>> memory structures.
>>>
>>> Signed-off-by: Steven Price <steven.price@....com>
>>> ---
>>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>>  1 file changed, 100 insertions(+)
>>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>>
>>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>>> new file mode 100644
>>> index 000000000000..1ceb118694e7
>>> --- /dev/null
>>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>>> @@ -0,0 +1,100 @@
>>> +Paravirtualized time support for arm64
>>> +======================================
>>> +
>>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>>> +support for AArch64 guests:
>>> +
>>> +https://developer.arm.com/docs/den0057/a
>>> +
>>> +KVM/arm64 implements the stolen time part of this specification by providing
>>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>>> +view of the amount of time stolen from its execution.
>>> +
>>> +Two new SMCCC compatible hypercalls are defined:
>>> +
>>> +PV_FEATURES 0xC5000020
>>> +PV_TIME_ST  0xC5000022
>>> +
>>> +These are only available in the SMC64/HVC64 calling convention as
>>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>>> +mechanism before calling it.
>>> +
>>> +PV_FEATURES
>>> +    Function ID:  (uint32)  : 0xC5000020
>>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>>> +                              PV-time feature is supported by the hypervisor.
>>> +
>>> +PV_TIME_ST
>>> +    Function ID:  (uint32)  : 0xC5000022
>>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>>> +                              (V)CPU. On failure:
>>> +                              NOT_SUPPORTED (-1)
>>> +
>>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>>> +with inner and outer write back caching attributes, in the inner shareable
>>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>>> +meaningfully filled by the hypervisor (see structure below).
>>> +
>>> +PV_TIME_ST returns the structure for the calling VCPU.
>>> +
>>> +Stolen Time
>>> +-----------
>>> +
>>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>>> +
>>> +  Field       | Byte Length | Byte Offset | Description
>>> +  ----------- | ----------- | ----------- | --------------------------
>>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>>> +  Attributes  |      4      |      4      | Must be 0
>>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>>> +              |             |             | nanoseconds indicating how
>>> +              |             |             | much time this VCPU thread
>>> +              |             |             | was involuntarily not
>>> +              |             |             | running on a physical CPU.
>>> +
>>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>>> +will be present within a reserved region of the normal memory given to the
>>> +guest. The guest should not attempt to write into this memory. There is a
>>> +structure per VCPU of the guest.
>>> +
>>> +User space interface
>>> +====================
>>> +
>>> +User space can request that KVM provide the paravirtualized time interface to
>>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>>> +
>>> +    struct kvm_create_device pvtime_device = {
>>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>>> +            .attr = 0,
>>> +            .flags = 0,
>>> +    };
>>> +
>>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>>> +
>>> +Creation of the device should be done after creating the vCPUs of the virtual
>>> +machine.
>>> +
>>> +The IPA of the structures must be given to KVM. This is the base address
>>> +of an array of stolen time structures (one for each VCPU). The base address
>>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>>> +multiple of PAGE_SIZE.
>>> +
>>> +The memory for these structures should be added to the guest in the usual
>>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>>> +
>>> +For example:
>>> +
>>> +    struct kvm_dev_arm_st_region region = {
>>> +            .gpa = <IPA of guest base address>,
>>> +            .size = <size in bytes>
>>> +    };
>>
>> This feel fragile; how are you handling userspace creating VCPUs after
>> setting this up, the GPA overlapping guest memory, etc.  Is the
>> philosophy here that the VMM can mess up the VM if it wants, but that
>> this should never lead attacks on the host (we better hope not) and so
>> we don't care?
>>
>> It seems to me setting the IPA per vcpu throught the VCPU device would
>> avoid a lot of these issues.  See
>> Documentation/virt/kvm/devices/vcpu.txt.
>>
>>
> I discussed this with Marc the other day, and we realized that if we
> make the configuration of the IPA per-PE, then a VMM can construct a VM
> where these data structures are distributed within the IPA space of a
> VM, which could lead to a lower TLB pressure for some
> configurations/workloads.

Ok, I'm dubious it will make much difference in terms of TLB pressure,
but I've done the refactoring and I think it actually simplifies the
code. So I'll post a new version where the base address is set via the
VCPU device.

Thanks for the review,

Steve