linux-kernel - Re: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45a1b6b5-407c-4518-9ea2-05341e93a67e@linux.microsoft.com>
Date: Wed, 26 Feb 2025 16:14:57 -0800
From: Nuno Das Neves <nunodasneves@...ux.microsoft.com>
To: mhklinux@...look.com, kys@...rosoft.com, haiyangz@...rosoft.com,
 wei.liu@...nel.org, decui@...rosoft.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 lpieralisi@...nel.org, kw@...ux.com, manivannan.sadhasivam@...aro.org,
 robh@...nel.org, bhelgaas@...gle.com, arnd@...db.de
Cc: x86@...nel.org, linux-hyperv@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
 linux-arch@...r.kernel.org
Subject: Re: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for
 hypercall arguments

On 2/26/2025 12:06 PM, mhkelley58@...il.com wrote:
> From: Michael Kelley <mhklinux@...look.com>
> 
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps leads to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1].
> 
> To reduce these kinds of problems, introduce a family of inline
> functions to replace the open coding. The functions provide a new way
> to manage the use of this per-vCPU memory that is usually the input and
> output arguments to Hyper-V hypercalls. The functions encapsulate
> key aspects of the usage and ensure that the TLFS requirements are
> met (max size of 1 page each for input and output, no overlap of
> input and output, aligned to 8 bytes, etc.). Conceptually, there
> is no longer a difference between the "per-vCPU input page" and
> "per-vCPU output page". Only a single per-vCPU page is allocated, and
> it provides both hypercall input and output memory. All current
> hypercalls can fit their input and output within that single page,
> though the new code allows easy changing to two pages should a future
> hypercall require a full page for each of the input and output.
> 
> The new functions always zero the fixed-size portion of the hypercall
> input area so that uninitialized memory is not inadvertently passed
> to the hypercall. Current open-coded hypercall call sites are
> inconsistent on this point, and use of the new functions addresses
> that inconsistency. The output area is not zero'ed by the new code
> as it is Hyper-V's responsibility to provide legal output.
> 
> When the input or output (or both) contain an array, the new functions
> calculate and return how many array entries fit within the per-cpu
> memory page, which is effectively the "batch size" for the hypercall
> processing multiple entries. This batch size can then be used in the
> hypercall control word to specify the repetition count. This
> calculation of the batch size replaces current open coding of the
> batch size, which is prone to errors. Note that the array portion of
> the input area is *not* zero'ed. The arrays are almost always 64-bit
> GPAs or something similar, and zero'ing that much memory seems
> wasteful at runtime when it will all be overwritten. The hypercall
> call site is responsible for ensuring that no part of the array is
> left uninitialized (just as with current code).
> 
> The new functions are realized as a single inline function that
> handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function are
> expected to be compile-time constants generated by "sizeof()"
> expressions. As such, most of the code in the new function can be
> evaluated by the compiler, with the result that the code paths are
> no longer than with the current open coding. The one exception is
> new code generated to zero the fixed-size portion of the input area
> in cases where it is not currently done.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> 
> Signed-off-by: Michael Kelley <mhklinux@...look.com>
> ---
>  include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
>  1 file changed, 102 insertions(+)
> 
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index b13b0cda4ac8..0c8a9133cf1a 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
>  	return status;
>  }
>  
> +/*
> + * Hypercall input and output argument setup
> + */
> +
> +/* Temporary mapping to be removed at the end of the patch series */
> +#define hyperv_pcpu_arg hyperv_pcpu_input_arg
> +
> +/*
> + * Allocate one page that is shared between input and output args, which is
> + * sufficient for all current hypercalls. If a future hypercall requires
> + * more space, change this value to "2" and everything will work.
> + */
> +#define HV_HVCALL_ARG_PAGES 1
> +
> +/*
> + * Allocate space for hypercall input and output arguments from the
> + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
> + * or output indicates to allocate no space for that argument. For input and
> + * for output, specify the size of the fixed portion, and the size of an
> + * element in a variable size array. A zero value for entry_size indicates
> + * there is no array. The fixed size space for the input is zero'ed.
> + *
It might be worth explicitly mentioning that interrupts should be disabled when
calling this function.

> + * When variable size arrays are present, the function returns the number of
> + * elements (i.e, the batch size) that fit in the available space.
> + *
> + * The four "size" arguments are expected to be constants, in which case the
> + * compiler does most of the calculations. Then the generated inline code is no
> + * larger than if open coding the access to the hyperv_pcpu_arg and doing
> + * memset() on the input.
> + */
> +static inline int hv_hvcall_inout_array(
> +			void *input, u32 in_size, u32 in_entry_size,
> +			void *output, u32 out_size, u32 out_entry_size)
Is there a reason input and output are void * instead of void ** here?

> +{
> +	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
> +	u32 in_total_size, out_total_size, offset;
> +	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
> +	void *space;
> +
> +	/*
> +	 * If input and output have arrays, allocate half the space to input
> +	 * and half to output. If only input has an array, the array can use
> +	 * all the space except for the fixed size output (but not to exceed
> +	 * one page), and vice versa.
> +	 */
> +	if (in_entry_size && out_entry_size)
> +		batch_space = batch_space / 2;
> +	else if (in_entry_size)
> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
> +	else if (out_entry_size)
> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
> +
> +	if (in_entry_size)
> +		in_batch_count = (batch_space - in_size) / in_entry_size;
> +	if (out_entry_size)
> +		out_batch_count = (batch_space - out_size) / out_entry_size;
> +
> +	/*
> +	 * If input and output have arrays, use the smaller of the two batch
> +	 * counts, in case they are different. If only one has an array, use
> +	 * that batch count. batch_count will be zero if neither has an array.
> +	 */
> +	if (in_batch_count && out_batch_count)
> +		batch_count = min(in_batch_count, out_batch_count);
> +	else
> +		batch_count = in_batch_count | out_batch_count;
> +
> +	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
> +	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
> +
> +	space = *this_cpu_ptr(hyperv_pcpu_arg);
> +	if (input) {
> +		*(void **)input = space;
> +		if (space)
> +			/* Zero the fixed size portion, not any array portion */
> +			memset(space, 0, ALIGN(in_size, 8));
> +	}
> +
> +	if (output) {
> +		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
> +			offset = in_total_size;
> +		} else {
> +			offset = HV_HYP_PAGE_SIZE;
> +			/* Need more than 1 page, but only 1 was allocated */
> +			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset
because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at
compile-time?
So will this also fail if any of the args in_size, in_entry_size, out_size,
out_entry_size are runtime-known?

Nuno

> +		}
> +		*(void **)output = space + offset;
> +	}
> +
> +	return batch_count;
> +}
> +
> +/* Wrappers for some of the simpler cases with only input, or with no arrays */
> +#define hv_hvcall_in(input, in_size) \
> +	hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0)
> +
> +#define hv_hvcall_inout(input, in_size, output, out_size) \
> +	hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0)
> +
> +#define hv_hvcall_in_array(input, in_size, in_entry_size) \
> +	hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0)
> +
>  /* Generate the guest OS identifier as described in the Hyper-V TLFS */
>  static inline u64 hv_generate_guest_id(u64 kernel_version)
>  {