[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251113194431.GB1175882@liuwe-devbox-debian-v2.local>
Date: Thu, 13 Nov 2025 19:44:31 +0000
From: Wei Liu <wei.liu@...nel.org>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>
Cc: Wei Liu <wei.liu@...nel.org>, Michael Kelley <mhklinux@...look.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kys@...rosoft.com" <kys@...rosoft.com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"decui@...rosoft.com" <decui@...rosoft.com>,
"longli@...rosoft.com" <longli@...rosoft.com>,
"skinsburskii@...ux.microsoft.com" <skinsburskii@...ux.microsoft.com>,
"prapal@...ux.microsoft.com" <prapal@...ux.microsoft.com>,
"mrathor@...ux.microsoft.com" <mrathor@...ux.microsoft.com>,
"muislam@...rosoft.com" <muislam@...rosoft.com>,
"anrayabh@...ux.microsoft.com" <anrayabh@...ux.microsoft.com>,
Jinank Jain <jinankjain@...rosoft.com>
Subject: Re: [PATCH v4] mshv: Extend create partition ioctl to support cpu
features
On Thu, Nov 13, 2025 at 11:11:57AM -0800, Nuno Das Neves wrote:
> On 11/13/2025 10:47 AM, Wei Liu wrote:
> > On Wed, Nov 12, 2025 at 04:27:05PM +0000, Michael Kelley wrote:
> >> From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Tuesday, November 11, 2025 3:20 PM
> >>>
> >>> The existing mshv create partition ioctl does not provide a way to
> >>> specify which cpu features are enabled in the guest. Instead, it
> >>> attempts to enable all features and those that are not supported are
> >>> silently disabled by the hypervisor.
> >>>
> >>> This was done to reduce unnecessary complexity and is sufficient for
> >>> many cases. However, new scenarios require fine-grained control over
> >>> these features.
> >>>
> >>> Define a new mshv_create_partition_v2 structure which supports
> >>> passing the disabled processor and xsave feature bits through to the
> >>> create partition hypercall directly.
> >>>
> >>> Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables
> >>> the new structure. If unset, the original mshv_create_partition struct
> >>> is used, with the old behavior of enabling all features.
> >>>
> >>> Co-developed-by: Jinank Jain <jinankjain@...rosoft.com>
> >>> Signed-off-by: Jinank Jain <jinankjain@...rosoft.com>
> >>> Signed-off-by: Muminul Islam <muislam@...rosoft.com>
> >>> Signed-off-by: Nuno Das Neves <nunodasneves@...ux.microsoft.com>
> >>> ---
> >>> Changes in v4:
> >>> - Change BIT() to BIT_ULL() [Michael Kelley]
> >>> - Enforce pt_num_cpu_fbanks == MSHV_NUM_CPU_FEATURES_BANKS and expect
> >>> that number to never change. In future, additional processor banks
> >>> will be settable as 'early' partition properties. Remove redundant
> >>> code that set default values for unspecified banks [Michael Kelley]
> >>> - Set xsave features to 0 on arm64 [Michael Kelley]
> >>> - Add clarifying comments in a few places
> >>>
> >>> Changes in v3:
> >>> - Remove the new cpu features definitions in hvhdk.h, and retain the
> >>> old behavior of enabling all features for the old struct. For the v2
> >>> struct, still disable unspecified feature banks, since that makes it
> >>> robust to future extensions.
> >>> - Amend comments and commit message to reflect the above
> >>> - Fix unused variable on arm64 [kernel test robot]
> >>>
> >>> Changes in v2:
> >>> - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot]
> >>> - Fix compilation issue on arm64 [kernel test robot]
> >>> ---
> >>> drivers/hv/mshv_root_main.c | 113 +++++++++++++++++++++++++++++-------
> >>> include/uapi/linux/mshv.h | 34 +++++++++++
> >>> 2 files changed, 126 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> >>> index d542a0143bb8..9f9438289b60 100644
> >>> --- a/drivers/hv/mshv_root_main.c
> >>> +++ b/drivers/hv/mshv_root_main.c
> >>> @@ -1900,43 +1900,114 @@ add_partition(struct mshv_partition *partition)
> >>> return 0;
> >>> }
> >>>
> >>> -static long
> >>> -mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> >>> +static_assert(MSHV_NUM_CPU_FEATURES_BANKS ==
> >>> + HV_PARTITION_PROCESSOR_FEATURES_BANKS);
> >>> +
> >>> +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 *pt_flags,
> >>> + struct hv_partition_creation_properties *cr_props,
> >>> + union hv_partition_isolation_properties *isol_props)
> >>> {
> >>> - struct mshv_create_partition args;
> >>> - u64 creation_flags;
> >>> - struct hv_partition_creation_properties creation_properties = {};
> >>> - union hv_partition_isolation_properties isolation_properties = {};
> >>> - struct mshv_partition *partition;
> >>> - struct file *file;
> >>> - int fd;
> >>> - long ret;
> >>> + int i;
> >>> + struct mshv_create_partition_v2 args;
> >>> + union hv_partition_processor_features *disabled_procs;
> >>> + union hv_partition_processor_xsave_features *disabled_xsave;
> >>>
> >>> - if (copy_from_user(&args, user_arg, sizeof(args)))
> >>> + /* First, copy v1 struct in case user is on previous versions */
> >>> + if (copy_from_user(&args, user_arg,
> >>> + sizeof(struct mshv_create_partition)))
> >>> return -EFAULT;
> >>>
> >>> if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) ||
> >>> args.pt_isolation >= MSHV_PT_ISOLATION_COUNT)
> >>> return -EINVAL;
> >>>
> >>> + disabled_procs = &cr_props->disabled_processor_features;
> >>> + disabled_xsave = &cr_props->disabled_processor_xsave_features;
> >>> +
> >>> + /* Check if user provided newer struct with feature fields */
> >>> + if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) {
> >>> + if (copy_from_user(&args, user_arg, sizeof(args)))
> >>> + return -EFAULT;
> >>
> >> There's subtle issue here that I didn't notice previously. This second copy_from_user()
> >> re-populates the first two fields of the "args" local variable. These two fields were
> >> validated by code a few lines above. But there's no guarantee that a second read of
> >> user space will get the same values. User space could have another thread that
> >> changes the user space values between the two copy_from_user() calls, and thereby
> >> sneak in some bogus values to be used further down in this function. Because of
> >> this risk, there's a general rule for kernel code, which is to avoid multiple accesses to
> >> the same user space values. There are places in the kernel where such double reads
> >> would be an exploitable security hole.
> >>
>
> Good catch Michael! It's something I had read about once before long ago but had forgotten.
> I wonder if there's some kind of automation that could warn about potential issues - i.e.
> copy_from_user() on the same pointer twice.
>
> >> The fix would be to validate the pt_flags and pt_isolation fields again, or to have the
> >> second copy_from_user copy only the additional fields. But it's also the case that the
> >> way the pt_flags and pt_isolation fields are used further down in this function,
> >> nothing bad can happen if malicious user space should succeed in sneaking in some
> >> bogus values.
> >>
> >> Net, as currently coded, there's nothing that needs to be fixed. It would be more
> >> robust to do one of the two fixes, if for no other reason than to acknowledge
> >> awareness of the risk of reading user space twice. But I'm not going to insist
> >> on a respin.
> >
> > Nuno, I can commit this patch first. If you can post a diff later I can
> > squash it in.
>
> It might be easier if I just spin a v5 today? I'll send it soon.
Yes, please resend. Thanks.
Wei
Powered by blists - more mailing lists