[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB41571FC666D406397858A377D4C5A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Wed, 5 Nov 2025 17:41:08 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"wei.liu@...nel.org" <wei.liu@...nel.org>, "muislam@...rosoft.com"
<muislam@...rosoft.com>, "easwar.hariharan@...ux.microsoft.com"
<easwar.hariharan@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
<haiyangz@...rosoft.com>, "decui@...rosoft.com" <decui@...rosoft.com>,
"longli@...rosoft.com" <longli@...rosoft.com>,
"skinsburskii@...ux.microsoft.com" <skinsburskii@...ux.microsoft.com>,
"romank@...ux.microsoft.com" <romank@...ux.microsoft.com>, Jinank Jain
<jinankjain@...rosoft.com>
Subject: RE: [PATCH v3] mshv: Extend create partition ioctl to support cpu
features
From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Tuesday, November 4, 2025 1:47 PM
>
> From: Muminul Islam <muislam@...rosoft.com>
>
> The existing mshv create partition ioctl does not provide a way to
> specify which cpu features are enabled in the guest. Instead, it
> attempts to enable all features and those that are not supported are
> silently disabled by the hypervisor.
>
> This was done to reduce unnecessary complexity and is sufficient for
> many cases. However, new scenarios require fine-grained control over
> these features.
>
> Define a new mshv_create_partition_v2 structure which supports
> passing the disabled processor and xsave feature bits through to the
> create partition hypercall directly.
>
> The kernel does not introspect the bits in these new fields as they
> are part of the hypervisor ABI. Require the caller to provide the
> number of cpu feature banks passed, to support extending the number
> of banks in future. Disable all banks that are not specified to ensure
> the behavior is predictable with newer hypervisors.
>
> Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables
> the new structure. If unset, the original mshv_create_partition struct
> is used, with the old behavior of enabling all features.
>
> Co-developed-by: Jinank Jain <jinankjain@...rosoft.com>
> Signed-off-by: Jinank Jain <jinankjain@...rosoft.com>
> Signed-off-by: Muminul Islam <muislam@...rosoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@...ux.microsoft.com>
> ---
> Changes in v3:
> - Remove the new cpu features definitions in hvhdk.h, and retain the
> old behavior of enabling all features for the old struct. For the v2
> struct, still disable unspecified feature banks, since that makes it
> robust to future extensions.
> - Amend comments and commit message to reflect the above
> - Fix unused variable on arm64 [kernel test robot]
>
> Changes in v2:
> - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot]
> - Fix compilation issue on arm64 [kernel test robot]
>
> ---
> drivers/hv/mshv_root_main.c | 94 ++++++++++++++++++++++++++++++-------
> include/uapi/linux/mshv.h | 34 ++++++++++++++
> 2 files changed, 110 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index d542a0143bb8..814465a0912d 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1900,43 +1900,101 @@ add_partition(struct mshv_partition *partition)
> return 0;
> }
>
> -static long
> -mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> +static_assert(MSHV_NUM_CPU_FEATURES_BANKS <=
> + HV_PARTITION_PROCESSOR_FEATURES_BANKS);
> +
> +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 *pt_flags,
> + struct hv_partition_creation_properties *cr_props,
> + union hv_partition_isolation_properties *isol_props)
> {
> - struct mshv_create_partition args;
> - u64 creation_flags;
> - struct hv_partition_creation_properties creation_properties = {};
> - union hv_partition_isolation_properties isolation_properties = {};
> - struct mshv_partition *partition;
> - struct file *file;
> - int fd;
> - long ret;
> + int i;
> + struct mshv_create_partition_v2 args;
> + union hv_partition_processor_features *disabled_procs;
> + union hv_partition_processor_xsave_features *disabled_xsave;
>
> - if (copy_from_user(&args, user_arg, sizeof(args)))
> + /* First, copy orig struct in case user is on previous versions */
> + if (copy_from_user(&args, user_arg,
> + sizeof(struct mshv_create_partition)))
> return -EFAULT;
>
> if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) ||
> args.pt_isolation >= MSHV_PT_ISOLATION_COUNT)
> return -EINVAL;
>
> + disabled_procs = &cr_props->disabled_processor_features;
> + disabled_xsave = &cr_props->disabled_processor_xsave_features;
> +
> + /* Check if user provided newer struct with feature fields */
> + if (args.pt_flags & BIT(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) {
Should probably be "BIT_ULL" instead of "BIT" since args.pt_flags is a long long field,
though it really doesn't matter as long as the number of flags is <= 32.
> + if (copy_from_user(&args, user_arg, sizeof(args)))
> + return -EFAULT;
> +
> + if (args.pt_num_cpu_fbanks > MSHV_NUM_CPU_FEATURES_BANKS ||
> + mshv_field_nonzero(args, pt_rsvd) ||
> + mshv_field_nonzero(args, pt_rsvd1))
> + return -EINVAL;
> +
> +
> + for (i = 0; i < args.pt_num_cpu_fbanks; i++)
> + disabled_procs->as_uint64[i] = args.pt_cpu_fbanks[i];
> +
> + /* Disable any features left unspecified */
> + for (; i < HV_PARTITION_PROCESSOR_FEATURES_BANKS; i++)
> + disabled_procs->as_uint64[i] = -1;
I'm trying to convince myself that disabling unspecified features is the right
thing to do. In the current hypervisor scenario with 2 banks, if the VMM caller
specifies only one bank of disable flags, then all the features in the 2nd bank
are disabled. That's certainly the reverse from the current code which
always enables all features, and from the hypervisor itself which in the
hypercall ABI defines the flags as "disable" flags rather than "enable" flags.
Then in a scenario where a new version of the hypervisor shows up with
support for 3 banks, the old VMM code that only knows about 2 banks
will cause all features in that 3rd bank to be disabled. Again, that's the
reverse of the current code.
I guess it depends on how the hypervisor defines any such new features.
Are they typically defined to be benign if they are enabled by default? Or
is the polarity the opposite, where the VMM must know about new
features before they are enabled? The hypercall interface seems to imply
the former but maybe I'm reading too much into it.
A code comment about the thinking here would be useful for future readers.
> +
> +#if IS_ENABLED(CONFIG_X86_64)
> + disabled_xsave->as_uint64 = args.pt_disabled_xsave;
> +#else
> + if (mshv_field_nonzero(args, pt_rsvd2))
> + return -EINVAL;
> +
> + disabled_xsave->as_uint64 = -1;
Does the arm64 version of the hypercall do anything with this field?
Or does it just ignore it? I would be more inclined to set ignored
fields to zero unless there's an identifiable reason otherwise.
(especially since the VMM is required to pass in zero on arm64).
> +#endif
> + } else {
> + /* v1 behavior: try to enable everything */
> + for (i = 0; i < HV_PARTITION_PROCESSOR_FEATURES_BANKS; i++)
> + disabled_procs->as_uint64[i] = 0;
> +
> + disabled_xsave->as_uint64 = 0;
> + }
> +
> /* Only support EXO partitions */
> - creation_flags = HV_PARTITION_CREATION_FLAG_EXO_PARTITION HV_PARTITION_CREATION_FLAG_INTERCEPT_MESSAGE_PAGE_ENABLED;
> + *pt_flags = HV_PARTITION_CREATION_FLAG_EXO_PARTITION |
> + HV_PARTITION_CREATION_FLAG_INTERCEPT_MESSAGE_PAGE_ENABLED;
>
> if (args.pt_flags & BIT(MSHV_PT_BIT_LAPIC))
> - creation_flags |= HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED;
> + *pt_flags |= HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED;
> if (args.pt_flags & BIT(MSHV_PT_BIT_X2APIC))
> - creation_flags |= HV_PARTITION_CREATION_FLAG_X2APIC_CAPABLE;
> + *pt_flags |= HV_PARTITION_CREATION_FLAG_X2APIC_CAPABLE;
> if (args.pt_flags & BIT(MSHV_PT_BIT_GPA_SUPER_PAGES))
> - creation_flags |= HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED;
> + *pt_flags |= HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED;
>
> switch (args.pt_isolation) {
> case MSHV_PT_ISOLATION_NONE:
> - isolation_properties.isolation_type =
> - HV_PARTITION_ISOLATION_TYPE_NONE;
> + isol_props->isolation_type = HV_PARTITION_ISOLATION_TYPE_NONE;
> break;
> }
>
> + return 0;
> +}
> +
> +static long
> +mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> +{
> + u64 creation_flags;
> + struct hv_partition_creation_properties creation_properties = {};
> + union hv_partition_isolation_properties isolation_properties = {};
> + struct mshv_partition *partition;
> + struct file *file;
> + int fd;
> + long ret;
> +
> + ret = mshv_ioctl_process_pt_flags(user_arg, &creation_flags,
> + &creation_properties,
> + &isolation_properties);
> + if (ret)
> + return ret;
> +
> partition = kzalloc(sizeof(*partition), GFP_KERNEL);
> if (!partition)
> return -ENOMEM;
> diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
> index 876bfe4e4227..9091946cba23 100644
> --- a/include/uapi/linux/mshv.h
> +++ b/include/uapi/linux/mshv.h
> @@ -26,6 +26,7 @@ enum {
> MSHV_PT_BIT_LAPIC,
> MSHV_PT_BIT_X2APIC,
> MSHV_PT_BIT_GPA_SUPER_PAGES,
> + MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES,
> MSHV_PT_BIT_COUNT,
> };
>
> @@ -41,6 +42,8 @@ enum {
> * @pt_flags: Bitmask of 1 << MSHV_PT_BIT_*
> * @pt_isolation: MSHV_PT_ISOLATION_*
> *
> + * This is the initial/v0 version for backward compatibility.
> + *
> * Returns a file descriptor to act as a handle to a guest partition.
> * At this point the partition is not yet initialized in the hypervisor.
> * Some operations must be done with the partition in this state, e.g. setting
> @@ -52,6 +55,37 @@ struct mshv_create_partition {
> __u64 pt_isolation;
> };
>
> +#define MSHV_NUM_CPU_FEATURES_BANKS 2
> +
> +/**
> + * struct mshv_create_partition_v2
> + *
> + * This is extended version of the above initial MSHV_CREATE_PARTITION
> + * ioctl and allows for following additional parameters:
> + *
> + * @pt_num_cpu_fbanks: number of processor feature banks being provided.
> + * This must not exceed MSHV_NUM_CPU_FEATURES_BANKS.
> + * If set to less than the number of available banks,
> + * additional banks will be set to -1 (disabled).
> + * @pt_cpu_fbanks: disabled processor feature banks array.
> + * @pt_disabled_xsave: disabled xsave feature bits.
> + *
> + * Returns : same as above original mshv_create_partition
> + */
> +struct mshv_create_partition_v2 {
> + __u64 pt_flags;
> + __u64 pt_isolation;
> + __u16 pt_num_cpu_fbanks;
> + __u8 pt_rsvd[6]; /* MBZ */
> + __u64 pt_cpu_fbanks[MSHV_NUM_CPU_FEATURES_BANKS];
> + __u64 pt_rsvd1[2]; /* MBZ */
I presume this is for future expansion of the number of banks?
And that the choice of 2 additional banks is somewhat arbitrary?
Michael
> +#if defined(__x86_64__)
> + __u64 pt_disabled_xsave;
> +#else
> + __u64 pt_rsvd2; /* MBZ */
> +#endif
> +} __packed;
> +
> /* /dev/mshv */
> #define MSHV_CREATE_PARTITION _IOW(MSHV_IOCTL, 0x00, struct
> mshv_create_partition)
>
> --
> 2.34.1
Powered by blists - more mailing lists