linux-kernel - RE: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157FF9AF372F252C6D8B0A1D4CD2@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 27 Feb 2025 01:50:43 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>, "kys@...rosoft.com"
	<kys@...rosoft.com>, "haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
	"wei.liu@...nel.org" <wei.liu@...nel.org>, "decui@...rosoft.com"
	<decui@...rosoft.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "hpa@...or.com"
	<hpa@...or.com>, "lpieralisi@...nel.org" <lpieralisi@...nel.org>,
	"kw@...ux.com" <kw@...ux.com>, "manivannan.sadhasivam@...aro.org"
	<manivannan.sadhasivam@...aro.org>, "robh@...nel.org" <robh@...nel.org>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>, "arnd@...db.de" <arnd@...db.de>
CC: "x86@...nel.org" <x86@...nel.org>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-pci@...r.kernel.org"
	<linux-pci@...r.kernel.org>, "linux-arch@...r.kernel.org"
	<linux-arch@...r.kernel.org>
Subject: RE: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for
 hypercall arguments

From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Wednesday, February 26, 2025 4:15 PM
> 
> On 2/26/2025 12:06 PM, mhkelley58@...il.com wrote:
> > From: Michael Kelley <mhklinux@...look.com>
> >
> > Current code allocates the "hyperv_pcpu_input_arg", and in
> > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> > page of memory allocated per-vCPU. A hypercall call site disables
> > interrupts, then uses this memory to set up the input parameters for
> > the hypercall, read the output results after hypercall execution, and
> > re-enable interrupts. The open coding of these steps leads to
> > inconsistencies, and in some cases, violation of the generic
> > requirements for the hypercall input and output as described in the
> > Hyper-V Top Level Functional Spec (TLFS)[1].
> >
> > To reduce these kinds of problems, introduce a family of inline
> > functions to replace the open coding. The functions provide a new way
> > to manage the use of this per-vCPU memory that is usually the input and
> > output arguments to Hyper-V hypercalls. The functions encapsulate
> > key aspects of the usage and ensure that the TLFS requirements are
> > met (max size of 1 page each for input and output, no overlap of
> > input and output, aligned to 8 bytes, etc.). Conceptually, there
> > is no longer a difference between the "per-vCPU input page" and
> > "per-vCPU output page". Only a single per-vCPU page is allocated, and
> > it provides both hypercall input and output memory. All current
> > hypercalls can fit their input and output within that single page,
> > though the new code allows easy changing to two pages should a future
> > hypercall require a full page for each of the input and output.
> >
> > The new functions always zero the fixed-size portion of the hypercall
> > input area so that uninitialized memory is not inadvertently passed
> > to the hypercall. Current open-coded hypercall call sites are
> > inconsistent on this point, and use of the new functions addresses
> > that inconsistency. The output area is not zero'ed by the new code
> > as it is Hyper-V's responsibility to provide legal output.
> >
> > When the input or output (or both) contain an array, the new functions
> > calculate and return how many array entries fit within the per-cpu
> > memory page, which is effectively the "batch size" for the hypercall
> > processing multiple entries. This batch size can then be used in the
> > hypercall control word to specify the repetition count. This
> > calculation of the batch size replaces current open coding of the
> > batch size, which is prone to errors. Note that the array portion of
> > the input area is *not* zero'ed. The arrays are almost always 64-bit
> > GPAs or something similar, and zero'ing that much memory seems
> > wasteful at runtime when it will all be overwritten. The hypercall
> > call site is responsible for ensuring that no part of the array is
> > left uninitialized (just as with current code).
> >
> > The new functions are realized as a single inline function that
> > handles the most complex case, which is a hypercall with input
> > and output, both of which contain arrays. Simpler cases are mapped to
> > this most complex case with #define wrappers that provide zero or NULL
> > for some arguments. Several of the arguments to this new function are
> > expected to be compile-time constants generated by "sizeof()"
> > expressions. As such, most of the code in the new function can be
> > evaluated by the compiler, with the result that the code paths are
> > no longer than with the current open coding. The one exception is
> > new code generated to zero the fixed-size portion of the input area
> > in cases where it is not currently done.
> >
> > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> >
> > Signed-off-by: Michael Kelley <mhklinux@...look.com>
> > ---
> >  include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
> >  1 file changed, 102 insertions(+)
> >
> > diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> > index b13b0cda4ac8..0c8a9133cf1a 100644
> > --- a/include/asm-generic/mshyperv.h
> > +++ b/include/asm-generic/mshyperv.h
> > @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
> >  	return status;
> >  }
> >
> > +/*
> > + * Hypercall input and output argument setup
> > + */
> > +
> > +/* Temporary mapping to be removed at the end of the patch series */
> > +#define hyperv_pcpu_arg hyperv_pcpu_input_arg
> > +
> > +/*
> > + * Allocate one page that is shared between input and output args, which is
> > + * sufficient for all current hypercalls. If a future hypercall requires
> > + * more space, change this value to "2" and everything will work.
> > + */
> > +#define HV_HVCALL_ARG_PAGES 1
> > +
> > +/*
> > + * Allocate space for hypercall input and output arguments from the
> > + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
> > + * or output indicates to allocate no space for that argument. For input and
> > + * for output, specify the size of the fixed portion, and the size of an
> > + * element in a variable size array. A zero value for entry_size indicates
> > + * there is no array. The fixed size space for the input is zero'ed.
> > + *
> It might be worth explicitly mentioning that interrupts should be disabled when
> calling this function.

Agreed.

> 
> > + * When variable size arrays are present, the function returns the number of
> > + * elements (i.e, the batch size) that fit in the available space.
> > + *
> > + * The four "size" arguments are expected to be constants, in which case the
> > + * compiler does most of the calculations. Then the generated inline code is no
> > + * larger than if open coding the access to the hyperv_pcpu_arg and doing
> > + * memset() on the input.
> > + */
> > +static inline int hv_hvcall_inout_array(
> > +			void *input, u32 in_size, u32 in_entry_size,
> > +			void *output, u32 out_size, u32 out_entry_size)
> Is there a reason input and output are void * instead of void ** here?

Yes -- it must be void *, and not void **.  Consider a function like get_vtl()
in Patch 3 of this series where local variable "input" is declared as:

struct hv_input_get_vp_registers *input;

Then the first argument to hv_hvcall_inout() will be of type
struct hv_input_get_vp_registers **. The compiler does not consider
this to match void ** and would generate an error. But
struct hv_input_get_vp_registers ** _does_ match void * and compiles
with no error. It makes sense when you think about it, though it
isn't obvious. I tried to use void ** initially and had to figure out
why the code wouldn't compile. :-)

> 
> > +{
> > +	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
> > +	u32 in_total_size, out_total_size, offset;
> > +	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
> > +	void *space;
> > +
> > +	/*
> > +	 * If input and output have arrays, allocate half the space to input
> > +	 * and half to output. If only input has an array, the array can use
> > +	 * all the space except for the fixed size output (but not to exceed
> > +	 * one page), and vice versa.
> > +	 */
> > +	if (in_entry_size && out_entry_size)
> > +		batch_space = batch_space / 2;
> > +	else if (in_entry_size)
> > +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
> > +	else if (out_entry_size)
> > +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
> > +
> > +	if (in_entry_size)
> > +		in_batch_count = (batch_space - in_size) / in_entry_size;
> > +	if (out_entry_size)
> > +		out_batch_count = (batch_space - out_size) / out_entry_size;
> > +
> > +	/*
> > +	 * If input and output have arrays, use the smaller of the two batch
> > +	 * counts, in case they are different. If only one has an array, use
> > +	 * that batch count. batch_count will be zero if neither has an array.
> > +	 */
> > +	if (in_batch_count && out_batch_count)
> > +		batch_count = min(in_batch_count, out_batch_count);
> > +	else
> > +		batch_count = in_batch_count | out_batch_count;
> > +
> > +	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
> > +	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
> > +
> > +	space = *this_cpu_ptr(hyperv_pcpu_arg);
> > +	if (input) {
> > +		*(void **)input = space;
> > +		if (space)
> > +			/* Zero the fixed size portion, not any array portion */
> > +			memset(space, 0, ALIGN(in_size, 8));
> > +	}
> > +
> > +	if (output) {
> > +		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
> > +			offset = in_total_size;
> > +		} else {
> > +			offset = HV_HYP_PAGE_SIZE;
> > +			/* Need more than 1 page, but only 1 was allocated */
> > +			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
> Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset
> because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at
> compile-time?

Correct. And if for some future hypercall in_total_size + out_total_size is
*not* <= HV_HYP_PAGE_SIZE, the BUILD_BUG_ON() will alert the developer
to the problem. Depending on the argument requirements of this future
hypercall, the solution might require changing HV_HVCALL_ARG_PAGES to 2.

> So will this also fail if any of the args in_size, in_entry_size, out_size,
> out_entry_size are runtime-known?

Correct.  I should change my wording about "The four 'size' arguments are
expected to be constants" to ". . . must be constants". OTOH, if there's a need
to support non-constants for any of these arguments, some additional code
could handle that case. But the overall hv_hvcall_inout_array() function will
end up generating a lot of code to execute at runtime. I looked at the hypercall
call sites in the OHCL kernel tree, and didn't see any need for non-constants,
but I haven't looked yet at the v4 patch series you just posted.  Let me know
if you have a case requiring non-constants.

Michael