lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB415783BB8A3844F5ADD142B2D4042@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 17 Dec 2024 17:48:00 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Nuno Das Neves <nunodasneves@...ux.microsoft.com>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arch@...r.kernel.org"
	<linux-arch@...r.kernel.org>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "catalin.marinas@....com"
	<catalin.marinas@....com>, "will@...nel.org" <will@...nel.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
	<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org"
	<x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>, "arnd@...db.de"
	<arnd@...db.de>, "jinankjain@...ux.microsoft.com"
	<jinankjain@...ux.microsoft.com>, "muminulrussell@...il.com"
	<muminulrussell@...il.com>, "skinsburskii@...ux.microsoft.com"
	<skinsburskii@...ux.microsoft.com>, "mukeshrathor@...rosoft.com"
	<mukeshrathor@...rosoft.com>
Subject: RE: [PATCH 0/2] hyperv: Move some features to common code

From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Monday, December 9, 2024 12:20 PM
> 
> On 12/7/2024 6:59 PM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@...ux.microsoft.com> Sent: Friday, December 6, 2024 2:22 PM
> >>
> >> There are several bits of Hyper-V-related code that today live in
> >> arch/x86 but are not really specific to x86_64 and will work on arm64
> >> too.
> >>
> >> Some of these will be needed in the upcoming mshv driver code (for
> >> Linux as root partition on Hyper-V).
> >
> > Previously, Linux as the root partition on Hyper-V was x86 only, which is
> > why the code is currently under arch/x86. So evidently the mshv driver
> > is being expanded to support both x86 and arm64, correct? Assuming
> > that's the case, I have some thoughts about how the source code should
> > be organized and built. It's probably best to get this right to start with so
> > it doesn't need to be changed again.
> 
> Yes, we plan on supporting both architectures (eventually). I completely agree
> that it's better to sort out these issues now rather than later.
> 
> >
> > * Patch 2 of this series moves hv_call_deposit_pages() and
> >    hv_call_create_vp() to common code, but does not move
> >    hv_call_add_logical_proc(). All three are used together, so
> >    I'm wondering why hv_call_add_logical_proc() isn't moved.
> >
> 
> The only reason is that in our internal tree there's no common or arm64 code
> yet that uses it - there is no reason it can't also become common code!

Maybe I'm missing your point, but hv_call_add_logical_proc() and
hv_call_create_vp() are called in succession in hv_smp_prepare_cpus(),
so it seems like they very much go together. Presumably a similar
sequence will be needed on the arm64 side when running as the
root partition?

> 
> > * These three functions were originally put in a separate source
> >    code file because of being specific to running in the root partition,
> >    and not needed for generic Linux guest support. I think there's
> >    value in keeping them in a separate file, rather than merging them
> >    into hv_common.c. Maybe just move the entire hv_proc.c file?
> 
> Agreed. I think it should be renamed too - this file will eventually
> contain some additional hypercall helper functions, some of which may also be
> shared by the driver code. Something like "hv_call_common.c"?

I went back and looked at your patch series from a year ago [1], and
got a better understanding of where this work is headed. I wanted
to comment on that series back then, but got subsumed in wrapping
things up for my retirement. :-(  I see significant portions of that
series have been posted independently and accepted, so my further
comments here assume the rest of the series is still the macro-level
approach you are taking.

>From that series, you are planning three modules, controlled by
CONFIG_MSHV, CONFIG_MSHV_ROOT, and CONFIG_MSHV_VTL.
That makes sense, and addresses one of my top-level concerns,
which is that normal guest kernels shouldn't include all that code.
And apparently that code works as a module as well as built-in.

The code currently in hv_proc.c is similar to the code in hv_call.c
and mshv_root_hv_call.c from that series -- it's a wrapper around
Hyper-V hypercalls. But a difference is that the code in hv_proc.c
can't be a module because it is called from hv_smp_prepare_cpus(),
which must be built-in. So it can't be added to the proposed
hv_call.c without making all of hv_call.c built-in. Ideally, there
would be a separate source file just for the code that must be
built-in. I'm not familiar enough with root partition requirements
to understand what hv_smp_prepare_cpus() and its calls to
hv_call_add_logical_proc() and hv_call_create_vp() are doing.
Evidently it is related to bringing up CPUs in the root partition,
and not related to guest VMs. But those two hv_call_* functions
would also be used for managing guest VMs from /dev/mshv.

As for the name, I don't really like "common", even though I'm the
one who created "hv_common.c" back when doing the initial arm64
support on Hyper-V. :-( My thinking is that anything that isn't under
arch/x86 or arch/arm64 is by definition shared across architectures,
so having "common" in the name is superfluous. Maybe just
staying with hv_proc.c is OK.

> 
> >    And then later, perhaps move the entire irqdomain.c file as well?
> Yes, may as well move it too.

irqdomain.c is also in that category of needing to be built-in. But
looking more closely, it is x86 specific and should stay where it is. I
can't immediately tell whether it's feasible to make the Hyper-V
IOMMU driver (and irqdomain.c) architecture neutral, or whether
a separate arm64 implementation will be needed. My guess is the
latter.

> 
> >    There's also an interesting question of whether to move them into
> >    drivers/hv, or create a new directory virt/hyperv. Hyper-V support
> >    started out 15 years ago structured as a driver, hence "drivers/hv".
> >    But over the time, the support has become significantly more than
> >    just a driver, so "virt/hyperv" might be a better location for
> >    non-driver code that had previously been under arch/x86 but is
> >    now common to all architectures.
> >
> I'd be fine with using "virt/hyperv", but I thought "virt" was only for
> KVM.

Now that I see the bigger picture from your previous patch series,
keeping the files in drivers/hv seems OK to me. The 'mshv' code *is*
a driver that implements /dev/mshv. :-)

> 
> Another option would be to create subdirectories in "drivers/hv" to
> organize the different modules more cleanly (i.e. when the /dev/mshv
> driver code is introduced).

Putting all the mshv and "running as root partition" files in a single
sub-directory might make sense. Multiple sub-directories might be
overkill. But I don't have a strong opinion either way. Putting them
all directly in drivers/hv seems OK, as does one or more sub-directories.

One thing I encountered back when first doing the arm64 support is
that everything in the drivers/hv directory could be built as a module.
The Hyper-V support under arch/x86 was always built-in, and the
arch/arm64 code I added was as well. But there was no obvious place
to put common code that must always be built-in. At the time, I thought
about introducing virt/hyperv, but had only a single relatively small source
code file, and introducing that new pathname with its own Makefile, etc.
seemed like overkill. So drivers/hv/hv_common.c came into existence. I
tweaked the existing drivers/hv/Makefile so hv_common.c is always
built-in even when CONFIG_HYPERV=m. It's a little messy, but not too
bad.

The difference in what can be built as a module vs. must be built-in
might be a factor in the directory structure, along with the
CONFIG options that control whether the root partition code
gets built at all (see below). The details will govern what works
well and what ends up being a bit of a mess.

> 
> > * Today, the code for running in the root partition is built along
> >    with the rest of the Hyper-V support, and so is present in kernels
> >    built for normal Linux guests on Hyper-V. I haven't thought about
> >    all the implications, but perhaps there's value in having a CONFIG
> >    option to build for the root partition, so that code can be dropped
> >    from normal kernels. There's a significant amount of new code still
> >    to come for mshv that could be excluded from normal guests in this
> >    way. Also, the tests of the hv_root_partition variable could be
> >    changed to a function the compiler detects is always "false" in a
> >    kernel built without the CONFIG option, in which case it can drop
> >    the code for where hv_root_partition is "true".
> >
> Using hv_root_partition is a good way to do it, since it won't require
> many #ifdefs or moving the existing code around too much.
> 
> I can certainly give it a try, and create a separate patch series
> introducing the option. I suppose "CONFIG_HYPERV_ROOT" makes sense as a
> name?

Now that I see you have CONFIG_MSHV, CONFIG_MSHV_ROOT, and
CONFIG_MSHV_VTL planned, could building the hv_root_partition code
just be under control of CONFIG_MSHV_ROOT without introducing
another CONFIG variable? Is any of the root partition code like
hv_proc.c and irqdomain.c needed if CONFIG_MSHV_ROOT=n?
Stubs will be needed for functions called from the generic kernel code
when CONFIG_MSHV_ROOT=n, but those are easily supplied in
asm/mshyperv.h. If hv_root_partition becomes a function whose
return value is gated by CONFIG_MSHV_ROOT, then the compiler
can know when the value is always "false" and drop even the code
that calls the stubs. But I'm pretty sure the stubs are still needed to
avoid compile errors, even when the compiler drops the code.

With this approach, you can avoid #ifdefs except in asm/mshyperv.h
for the stubs, and in the hv_root_partition() function.

One more code structure topic:  In the previous patch series,
some of the mshv code is x86 specific. There is x86 assembler
code, and references to x86 specific registers (all the HV_X64_*
registers). Do you plan to put architecture specific code under
drivers/hv, or under arch/[x86/arm64]/hyperv? We've made
drivers/hv be architecture neutral -- currently there's only one
"cheat" with an #ifdef CONFIG_ARM64 in the hv_balloon.c driver
that turns off some functionality.

As we discussed previously, the new Hyper-V #include files
provide the union of x86 and arm64 definitions, with just an
occasional #ifdef where needed. That was justified because
that's how the definitions come from the Windows/Hyper-V
world. But it seems like the mshv code should be structured
with arch specific code is under arch, not in drivers/hv with
#ifdefs. The Makefiles in arch/[x86,arm64]/hyperv will need
to use CONFIG_MSHV, CONFIG_MSHV_ROOT, and
CONFIG_MSHV_VTL to decide what to build, and whether to
build as a module vs. built-in.

Maybe putting the mshv code in a "mshv" subdirectory under
both drivers/hv and arch/[x86,arm64]/hyperv would conceptually
help tie together the arch neutral and arch specific portions.
Or something like that ....

> 
> > * The code currently in hv_proc.c is built for x86 only, and validly
> >    assumes the page size is 4K. But when the code moves to be
> >    common across architectures, that assumption is no longer
> >    valid in the general case. Perhaps the intent is that kernels for
> >    the root partition should always be built with page size 4K on
> >    arm64, but nothing enforces that intent. Personally, I think the code
> >    should be made to work with page sizes other than 4K so as to not
> >    leave technical debt. But I realize you may have other priorities. If
> >    there were a CONFIG option for building for the root partition,
> >    that option could be setup to enforce the 4K page size on arm64.
> >
> That makes sense. I suppose this can be done by selecting PAGE_SIZE_4KB
> under HYPERV in drivers/hv/Kconfig?

Since the PAGE_SIZE value is independently selectable in the .config
file, I'm not sure if you can override it when CONFIG_MSHV_ROOT is
set. But you can allow CONFIG_MSHV_ROOT to be set only if
PAGE_SIZE_4KB is selected on arm64. I'd have to look in more detail
to figure out the best way to create an appropriate dependency.

> 
> I'm not how easy it will be to make the code work with different page
> sizes, since we use alloc_page() and similar in a few places, assuming 4k.

alloc_page() is not the problem as it is relatively easy to break up a
16K or 64K page into multiple 4K pages when depositing memory.
And you can round up the amount of deposited memory to the
larger boundary without doing any harm. But in your previous patch
series, I see hv_call_withdraw_memory(), wherein Hyper-V gives
back individual 4K pages in no particular order. That's the killer, as
it is not feasible to re-assemble a random set of 4K pages into larger
16K or 64K pages for free_page().

So scratch that idea. :-( The root partition must run with a page
size that matches the size Hyper-V uses to do memory deposits to
and from a partition, and that's 4K.

Michael

[1] https://lore.kernel.org/lkml/1696010501-24584-1-git-send-email-nunodasneves@linux.microsoft.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ