linux-kernel - Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180711103819.GK9486@e103592.cambridge.arm.com>
Date:   Wed, 11 Jul 2018 11:38:21 +0100
From:   Dave Martin <Dave.Martin@....com>
To:     Suzuki K Poulose <Suzuki.Poulose@....com>
Cc:     cdall@...nel.org, kvm@...r.kernel.org,
        Marc Zyngier <marc.zyngier@....com>, catalin.marinas@....com,
        punit.agrawal@....com, Will Deacon <will.deacon@....com>,
        linux-kernel@...r.kernel.org, qemu-devel@...gnu.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        kvmarm@...ts.cs.columbia.edu, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical
 address size for VM

On Wed, Jul 11, 2018 at 10:05:50AM +0100, Suzuki K Poulose wrote:
> On 10/07/18 18:03, Dave Martin wrote:
> >On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote:
> >>On 09/07/18 14:37, Dave Martin wrote:
> >>>On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> >>>>On 09/07/18 12:23, Dave Martin wrote:
> >
> >[...]
> >
> >>>>>Wedging arguments into a few bits in the type argument feels awkward,
> >>>>>and may be regretted later if we run out of bits, or something can't be
> >>>>>represented in the chosen encoding.
> >>>>
> >>>>I think that's a pretty convincing argument for a "better" CREATE_VM,
> >>>>one that would have a clearly defined, structured (and potentially
> >>>>extensible) argument.
> >>>>
> >>>>I've quickly hacked the following:
> >>>>
> >>>>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>>index b6270a3b38e9..3e76214034c2 100644
> >>>>--- a/include/uapi/linux/kvm.h
> >>>>+++ b/include/uapi/linux/kvm.h
> >>>>@@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt {
> >>>>  	__u32 pad;
> >>>>  };
> >>>>
> >>>>+struct kvm_create_vm2 {
> >>>>+	__u64	version;	/* Or maybe not */
> >>>>+	union {
> >>>>+		struct {
> >>>>+#define KVM_ARM_SVE_CAPABLE	(1 << 0)
> >>>>+#define KVM_ARM_SELECT_IPA	{1 << 1)
> >>>>+			__u64	capabilities;
> >>>>+			__u16	sve_vlen;
> >>>>+			__u8	ipa_size;
> >>>>+		} arm64;
> >>>>+		__u64	dummy[15];
> >>>>+	};
> >>>>+};
> >>>>+
> >>>>  #define KVMIO 0xAE
> >>>>
> >>>>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> >>>>
> >>>>Other architectures could fill in their own bits if they need to.
> >>>>
> >>>>Thoughts?
> >>>
> >>>This kind of thing should work, but it may still get messy when we
> >>>add additional fields.
> >>
> >>
> >>Marc, Dave,
> >>
> >>I like Dave's approach. Some comments below.
> >>
> >>>
> >>>It we want this to work cross-arch, would it make sense to go
> >>>for a more generic approach, say
> >>>
> >>>struct kvm_create_vm_attr_any {
> >>>         __u32   type;
> >>>};
> >>>
> >>>#define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1
> >>>struct kvm_create_vm_attr_arch_capabilities {
> >>>         __u32   type;
> >>>         __u16   size; /* support future expansion of capabilities[] */
> >>>         __u16   reserved;
> >>>         __u64   capabilities[1];
> >>>};
> >>
> >>We also need to advertise which attributes are supported by the host,
> >>so that the user can tune the available ones. That would make a bit mask
> >>like the above trickier, unless we return the supported values back
> >>in the argument ptr for the "probe" call. And this scheme in general
> >>can be useful for passing back a non-boolean result specific to the
> >>attribute, without having a per-attribute ioctl. (e.g, maximum limit
> >>for IPA).
> >
> >Maybe, but this could quickly become bloated.  (My approach already
> >feels a bit bloated...)
> >
> >I'm not sure that arbitrarily complex negotiation will really be
> >needed, but userspace might want to change its mind if setting a
> >particular propertiy fails.
> >
> >An alternative might be to have a bunch of per-VM ioctls to configure
> >different things, like x86 has.  There's at least precedent for that.
> >For arm, we currently only have a few.  That allows for easy extension,
> >at the cost of adding ioctls.
> 
> As you know, one of the major problems with the per-VM ioctls is
> the ordering of different operations and tracking to make sure that
> the userspace follows the expected order. e.g, the first approach for
> IPA series was based on this and it made things complex enough to drop
> it.

I'm aware of that, but if we are adding a new KVM_CREATE_VM, we could
perhaps give it different semantics: i.e., we create a half-created VM
that only accepts configuration ioctls and a "finish creation" ioctl
that finalises everything before you're allowed to create devices,
vcpus etc.

This is the sort of thing I was moving torwards for SVE (but for
vcpus there).

I'm not saying we should drop the existing KVM_CREATE_VM2 ideas,
but that we should take a step back if it starts to accrue complexity.

> >
> >There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per-
> >vm capability flags.
> 
> May be we could switch to KVM_VM_CAPS and pass a list of capabilities
> to be enabled at creation time ? The kvm_enable_cap can pass in additional
> arguments for each cap. That way we don't have to rely on a new set of
> attributes and probing becomes straight forward.

That's a possibility.  I guess we'd need to understand how exactly x86
uses this.

Cheers
---Dave