lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157261D285166832028EB54D495A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Sat, 24 Jan 2026 04:14:18 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"skinsburskii@...ux.microsoft.com" <skinsburskii@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "longli@...rosoft.com"
	<longli@...rosoft.com>, "prapal@...ux.microsoft.com"
	<prapal@...ux.microsoft.com>, "mrathor@...ux.microsoft.com"
	<mrathor@...ux.microsoft.com>, "paekkaladevi@...ux.microsoft.com"
	<paekkaladevi@...ux.microsoft.com>
Subject: RE: [PATCH v4 7/7] mshv: Add debugfs to view hypervisor statistics

From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Friday, January 23, 2026 1:11 PM
> 
> On 1/23/2026 9:09 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Wednesday, January 21, 2026 1:46 PM
> >>
> >> Introduce a debugfs interface to expose root and child partition stats
> >> when running with mshv_root.
> >>
> >> Create a debugfs directory "mshv" containing 'stats' files organized by
> >> type and id. A stats file contains a number of counters depending on
> >> its type. e.g. an excerpt from a VP stats file:
> >>
> >> TotalRunTime                  : 1997602722
> >> HypervisorRunTime             : 649671371
> >> RemoteNodeRunTime             : 0
> >> NormalizedRunTime             : 1997602721
> >> IdealCpu                      : 0
> >> HypercallsCount               : 1708169
> >> HypercallsTime                : 111914774
> >> PageInvalidationsCount        : 0
> >> PageInvalidationsTime         : 0
> >>
> >> On a root partition with some active child partitions, the entire
> >> directory structure may look like:
> >>
> >> mshv/
> >>   stats             # hypervisor stats
> >>   lp/               # logical processors
> >>     0/              # LP id
> >>       stats         # LP 0 stats
> >>     1/
> >>     2/
> >>     3/
> >>   partition/        # partition stats
> >>     1/              # root partition id
> >>       stats         # root partition stats
> >>       vp/           # root virtual processors
> >>         0/          # root VP id
> >>           stats     # root VP 0 stats
> >>         1/
> >>         2/
> >>         3/
> >>     42/             # child partition id
> >>       stats         # child partition stats
> >>       vp/           # child VPs
> >>         0/          # child VP id
> >>           stats     # child VP 0 stats
> >>         1/
> >>     43/
> >>     55/
> >>
> >> On L1VH, some stats are not present as it does not own the hardware
> >> like the root partition does:
> >> - The hypervisor and lp stats are not present
> >> - L1VH's partition directory is named "self" because it can't get its
> >>   own id
> >> - Some of L1VH's partition and VP stats fields are not populated, because
> >>   it can't map its own HV_STATS_AREA_PARENT page.
> >>
> >> Co-developed-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> >> Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> >> Co-developed-by: Praveen K Paladugu <prapal@...ux.microsoft.com>
> >> Signed-off-by: Praveen K Paladugu <prapal@...ux.microsoft.com>
> >> Co-developed-by: Mukesh Rathor <mrathor@...ux.microsoft.com>
> >> Signed-off-by: Mukesh Rathor <mrathor@...ux.microsoft.com>
> >> Co-developed-by: Purna Pavan Chandra Aekkaladevi
> >> <paekkaladevi@...ux.microsoft.com>
> >> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@...ux.microsoft.com>
> >> Co-developed-by: Jinank Jain <jinankjain@...rosoft.com>
> >> Signed-off-by: Jinank Jain <jinankjain@...rosoft.com>
> >> Signed-off-by: Nuno Das Neves <nunodasneves@...ux.microsoft.com>
> >> Reviewed-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> >> ---
> >>  drivers/hv/Makefile         |   1 +
> >>  drivers/hv/hv_counters.c    |   1 +
> >>  drivers/hv/hv_synic.c       | 177 +++++++++
> >
> > This new file hv_synic.c seems to be spurious.  It looks like you unintentionally
> > picked up this new file from the build tree where you were creating the patches
> > for this series.
> >
> 
> Oh, that's embarrassing! Yes, it's a half-baked, unrelated work-in-progress...
> Please ignore!
> 
> <snip>
> >> diff --git a/drivers/hv/mshv_debugfs.c b/drivers/hv/mshv_debugfs.c
> >> new file mode 100644
> >> index 000000000000..72eb0ae44e4b
> >> --- /dev/null
> >> +++ b/drivers/hv/mshv_debugfs.c
> >> @@ -0,0 +1,703 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * Copyright (c) 2026, Microsoft Corporation.
> >> + *
> >> + * The /sys/kernel/debug/mshv directory contents.
> >> + * Contains various statistics data, provided by the hypervisor.
> >> + *
> >> + * Authors: Microsoft Linux virtualization team
> >> + */
> >> +
> >> +#include <linux/debugfs.h>
> >> +#include <linux/stringify.h>
> >> +#include <asm/mshyperv.h>
> >> +#include <linux/slab.h>
> >> +
> >> +#include "mshv.h"
> >> +#include "mshv_root.h"
> >> +
> >> +#include "hv_counters.c"
> >> +
> >> +#define U32_BUF_SZ 11
> >> +#define U64_BUF_SZ 21
> >> +#define NUM_STATS_AREAS (HV_STATS_AREA_PARENT + 1)
> >
> > This is sort of weak in that it doesn't really guard against
> > changes in the enum that defines HV_STATS_AREA_PARENT.
> > It would work if it were defined as part of the enum, but then
> > you are changing the code coming from the Windows world,
> > which I know is a different problem.
> >
> > The enum is part of the hypervisor ABI and hence isn't likely to
> > change, but it still feels funny to define NUM_STATS_AREAS like
> > this. I would suggest dropping this and just using
> > HV_STATS_AREA_COUNT for the memory allocations even
> > though doing so will allocate space for a stats area pointer
> > that isn't used by this code. It's only a few bytes.
> >
> 
> That would work, but then I'd want to have a comment explaining
> that the decision is intentional, otherwise I think it's just as
> confusing to have unexplained wasted space.

Yes, that's true.

> 
> Alternatively, the usage of SELF and PARENT (but not INTERNAL)
> could be made explicit by a compile-time check, and a comment to
> clarify:
> 
> /* Only support SELF and PARENT areas */
> #define NUM_STATS_AREAS 2
> static_assert(HV_STATS_AREA_SELF == 0 && HV_STATS_AREA_PARENT == 1,
> 	      "SELF and PARENT areas must be usable as indices into an array of size NUM_STATS_AREAS")

Works for me. 

[snip]

> >> +static int __init mshv_debugfs_hv_stats_create(struct dentry *parent)
> >> +{
> >> +	struct dentry *dentry;
> >> +	u64 *stats;
> >> +	int err;
> >> +
> >> +	stats = mshv_hv_stats_map();
> >> +	if (IS_ERR(stats))
> >> +		return PTR_ERR(stats);
> >> +
> >> +	dentry = debugfs_create_file("stats", 0400, parent,
> >> +				     stats, &hv_stats_fops);
> >> +	if (IS_ERR(dentry)) {
> >> +		err = PTR_ERR(dentry);
> >> +		pr_err("%s: failed to create hypervisor stats dentry: %d\n",
> >> +		       __func__, err);
> >> +		goto unmap_hv_stats;
> >> +	}
> >> +
> >> +	mshv_lps_count = num_present_cpus();
> >
> > This method of setting mshv_lps_count, and the iteration through the lp_index
> > in mshv_debugfs_lp_create() and mshv_debugfs_lp_remove(), seems risky. The
> > lp_index gets passed to the hypervisor, so it must be the hypervisor's concept
> > of the lp_index. Is that always guaranteed to be the same as Linux's numbering
> > of the present CPUs? There may be edge cases where it is not. For example, what
> > if Linux in the root partition were booted with the "nosmt" kernel boot option,
> > such that Linux ignores all the 2nd hyper-threads in a core? Could that create
> > a numbering mismatch?
> >
> 
> Ah, this was using the hypervisor stats page before; HvLogicalProcessors. But
> I removed the enum, so I thought this would be a reasonable way to get the number
> of LPs, but I think I'm mistaken.
> 
> For context, there is a fix to how LP and VP numbers are assigned in
> hv_smp_prepare_cpus(), but it's part of a future patchset. That fix ensures the
> LP indices are dense. The code looks like:
> 
> 	/* create dense LPs from 0-N for all apicids */
>         i = next_smallest_apicid(apicids, 0);
>         for (lpidx = 1; i != INT_MAX; lpidx++) {
>                 node = __apicid_to_node[i];
>                 if (node == NUMA_NO_NODE)
>                         node = 0;
> 
>                 /* params: node num, lp index, apic id */
>                 ret = hv_call_add_logical_proc(node, lpidx, i);
>                 BUG_ON(ret);
> 
>                 i = next_smallest_apicid(apicids, i);
>         }
> 
> 	/* create a VP for each present CPU */
>         lpidx = 1;         /* skip BSP cpu 0 */
>         for_each_present_cpu(i) {
>                 if (i == 0)
>                         continue;
> 
>                 /* params: node num, domid, vp index, lp index */
>                 ret = hv_call_create_vp(numa_cpu_node(i),
>                                         hv_current_partition_id, lpidx, lpidx);
>                 BUG_ON(ret);
>                 lpidx++;
>         }
> 
> For what it's worth, with that fix^ I tested with "nosmt" and things worked as I
> would expect: All LPs were displayed in debugfs, but every second LP was not in
> use by Linux, as evidenced by e.g. the number of timer interrupts not going up:
> LpTimerInterrupts            : 1
> 
> Also, only every second VP was created (0, 2, 4, 6...) since the others aren't
> in the present mask at boot.

OK -- I'm glad to hear that someone has thought about this potential
issue. next_smallest_apicid() looks like it should handle the gaps that can occur
in APIC IDs, as I pointed out in my separate email to you about an issue unrelated
to this patch set.

> 
> > Note that for vp_index, we have the hv_vp_index[] array for translating from
> > Linux's concept of a CPU number to Hyper-V's concept of vp_index. For
> > example, mshv_debugfs_parent_partition_create() correctly goes through
> > this translation. And presumably when the VMM code does the
> > MSHV_CREATE_VP ioctl, it is passing in a hypervisor vp_index.
> >
> > Everything may work fine "as is" for the moment, but the lp functions here
> > are still conflating the hypervisor's LP numbering with Linux's CPU numbering,
> > and that seems like a recipe for trouble somewhere down the road. I'm
> > not sure how the hypervisor interprets the "lp_index" part of the identity
> > argument passed to a hypercall, so I'm not sure what the fix is.
> >
> 
> The simplest thing for now might be to bring back that enum value
> HvLogicalProcessors just for this one usage. I'll admit I'm not familiar with
> all the nuances here so there are still probably edge cases here.

Yes, that seems like it would make sense. At least it would have the
hypervisor reporting how many LPs it thinks there are, instead of
getting Linux's view, which might be different. Guaranteeing that the
LP indices are dense can come later since in all likelihood they are
dense by default. It's the potential for an atypical corner case that
I was worried about.

Michael

> 
> >> +
> >> +	return 0;
> >> +
> >> +unmap_hv_stats:
> >> +	mshv_hv_stats_unmap();
> >> +	return err;
> >> +}
> >> +
> <snip>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ