linux-kernel - RE: [RFC v1 4/5] hyperv: allow hypercall output pages to be allocated for child partitions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157BF936EBDA23AD1EC5183D480A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Sun, 11 Jan 2026 22:27:06 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Yu Zhang <zhangyu1@...ux.microsoft.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, "linux-pci@...r.kernel.org"
	<linux-pci@...r.kernel.org>, "kys@...rosoft.com" <kys@...rosoft.com>,
	"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>, "wei.liu@...nel.org"
	<wei.liu@...nel.org>, "decui@...rosoft.com" <decui@...rosoft.com>,
	"lpieralisi@...nel.org" <lpieralisi@...nel.org>, "kwilczynski@...nel.org"
	<kwilczynski@...nel.org>, "mani@...nel.org" <mani@...nel.org>,
	"robh@...nel.org" <robh@...nel.org>, "bhelgaas@...gle.com"
	<bhelgaas@...gle.com>, "arnd@...db.de" <arnd@...db.de>, "joro@...tes.org"
	<joro@...tes.org>, "will@...nel.org" <will@...nel.org>,
	"robin.murphy@....com" <robin.murphy@....com>,
	"easwar.hariharan@...ux.microsoft.com"
	<easwar.hariharan@...ux.microsoft.com>, "jacob.pan@...ux.microsoft.com"
	<jacob.pan@...ux.microsoft.com>, "nunodasneves@...ux.microsoft.com"
	<nunodasneves@...ux.microsoft.com>, "mrathor@...ux.microsoft.com"
	<mrathor@...ux.microsoft.com>, "peterz@...radead.org" <peterz@...radead.org>,
	"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: RE: [RFC v1 4/5] hyperv: allow hypercall output pages to be allocated
 for child partitions

From: Yu Zhang <zhangyu1@...ux.microsoft.com> Sent: Friday, January 9, 2026 9:07 PM
> 
> On Thu, Jan 08, 2026 at 06:47:44PM +0000, Michael Kelley wrote:
> > From: Yu Zhang <zhangyu1@...ux.microsoft.com> Sent: Monday, December 8, 2025
> 9:11 PM
> > >
> >
> > The "Subject:" line prefix for this patch should probably be "Drivers: hv:"
> > to be consistent with most other changes to this source code file.
> >
> > > Previously, the allocation of per-CPU output argument pages was restricted
> > > to root partitions or those operating in VTL mode.
> > >
> > > Remove this restriction to support guest IOMMU related hypercalls, which
> > > require valid output pages to function correctly.
> >
> > The thinking here isn't quite correct. Just because a hypercall produces output
> > doesn't mean that Linux needs to allocate a page for the output that is separate
> > from the input. It's perfectly OK to use the same page for both input and output,
> > as long as the two areas don't overlap. Yes, the page is called
> > "hyperv_pcpu_input_arg", but that's a historical artifact from before the time
> > it was realized that the same page can be used for both input and output.
> >
> > Of course, if there's ever a hypercall that needs lots of input and lots of output
> > such that the combined size doesn't fit in a single page, then separate input
> > and output pages will be needed. But I'm skeptical that will ever happen. Rep
> > hypercalls could have large amounts of input and/or output, but I'd venture
> > that the rep count can always be managed so everything fits in a single page.
> >
> 
> Thanks, Michael.
> 
> Is there an existing hypercall precedent that reuses the input page for output?
> I believe reusing the input page should be acceptable, at least for pvIOMMU's
> hypercalls, but I will confirm these interfaces with the Hyper-V team.

See hv_pci_read_mmio() for a precedent in current kernel code.

There's also hv_get_partition_id() which uses hyperv_pcpu_input_page for
the hypercall output. But in this case, there is no input, so input and output
aren't actually sharing the page.

In the kernel 6.13 and earlier, get_vtl() used the hyperv_pcpu_input_page
for both input and output, but it did it wrong because the input and output areas
overlapped. While overlap worked because the hypercall is a simple "one-shot"
operation (i.e., read the input, then write the output), it's not legal according
to the TLFS. When the illegal overlap was fixed in commit 07412e1f163d, the
developer decided to allocate the hyperv_pcpu_output_page for VTL2 images,
so the fix uses separate pages for the input and output. There was extensive
discussion of the tradeoffs in allocating the output page for VTL2. In my view
it was an unnecessary use of memory, but the developer preferred to do it for
consistency, and I didn't press the argument because it was limited to VTL2.
Similarly, I won't press the argument here if folks really want to always allocate
the output page. My only request is that the commit message not be misleading
about the reason.

See https://elixir.bootlin.com/linux/v6.13/source/arch/x86/hyperv/hv_init.c#L416
for the older get_vtl() code that puts the input and output in the same page, but
improperly overlaps.

> 
> > >
> > > While unconditionally allocating per-CPU output pages scales with the number
> > > of vCPUs, and potentially adding overhead for guests that may not utilize the
> > > IOMMU, this change anticipates that future hypercalls from child partitions
> > > may also require these output pages.
> >
> > I've heard the argument that the amount of overhead is modest relative to the
> > overall amount of memory that is typically in a VM, particularly VMs with high
> > vCPU counts. And I don't disagree. But on the flip side, why tie up memory when
> > there's no need to do so? I'd argue for dropping this patch, and changing the
> > two hypercall call sites in Patch 5 to just use part of the so-called hypercall input
> > page for the output as well. It's only a one-line change in each hypercall call site.
> >
> 
> I share your concern about unconditionally allocating a separate output page
> for each vCPU. And if reusing the input page isn't accepted by the Hyper-V team,
> perhaps we could gate the allocation by checking
> IS_ENABLED(CONFIG_HYPERV_PVIOMMU)
> in hv_output_page_exist()?

Yes, that's doable, though I hope it doesn't come to that. At some point the
additional complexity starts to favor just allocating the output page. :-)

Michael