lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR21MB1688F63C2410904F92B1F75FD7389@BYAPR21MB1688.namprd21.prod.outlook.com>
Date:   Thu, 3 Nov 2022 20:33:40 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
CC:     Stanislav Kinsburskiy <stanislav.kinsburskiy@...il.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 4/4] drivers/clocksource/hyper-v: Add TSC page support
 for root partition

From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Thursday, November 3, 2022 10:59 AM
> 
> Microsoft Hypervisor root partition has to map the TSC page specified
> by the hypervisor, instead of providing the page to the hypervisor like
> it's done in the guest partitions.
> 
> However, it's too early to map the page when the clock is initialized, so, the
> actual mapping is happening later.
> 
> Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@...il.com>
> CC: "K. Y. Srinivasan" <kys@...rosoft.com>
> CC: Haiyang Zhang <haiyangz@...rosoft.com>
> CC: Wei Liu <wei.liu@...nel.org>
> CC: Dexuan Cui <decui@...rosoft.com>
> CC: Thomas Gleixner <tglx@...utronix.de>
> CC: Ingo Molnar <mingo@...hat.com>
> CC: Borislav Petkov <bp@...en8.de>
> CC: Dave Hansen <dave.hansen@...ux.intel.com>
> CC: x86@...nel.org
> CC: "H. Peter Anvin" <hpa@...or.com>
> CC: Daniel Lezcano <daniel.lezcano@...aro.org>
> CC: linux-hyperv@...r.kernel.org
> CC: linux-kernel@...r.kernel.org
> ---
>  arch/x86/hyperv/hv_init.c          |    2 ++
>  drivers/clocksource/hyperv_timer.c |   38 +++++++++++++++++++++++++++---------
>  include/clocksource/hyperv_timer.h |    1 +
>  3 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f49bc3ec76e6..89954490af93 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -464,6 +464,8 @@ void __init hyperv_init(void)
>  		BUG_ON(!src);
>  		memcpy_to_page(pg, 0, src, HV_HYP_PAGE_SIZE);
>  		memunmap(src);
> +
> +		hv_remap_tsc_clocksource();
>  	} else {
>  		hypercall_msr.guest_physical_address =
> vmalloc_to_pfn(hv_hypercall_pg);
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index 9445a1558fe9..dec7ad3b85ba 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -509,9 +509,6 @@ static bool __init hv_init_tsc_clocksource(void)
>  	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>  		return false;
> 
> -	if (hv_root_partition)
> -		return false;
> -
>  	/*
>  	 * If Hyper-V offers TSC_INVARIANT, then the virtualized TSC correctly
>  	 * handles frequency and offset changes due to live migration,
> @@ -529,16 +526,22 @@ static bool __init hv_init_tsc_clocksource(void)
>  	}
> 
>  	hv_read_reference_counter = read_hv_clock_tsc;
> -	tsc_pfn = HVPFN_DOWN(virt_to_phys(tsc_page));
> 
>  	/*
> -	 * The Hyper-V TLFS specifies to preserve the value of reserved
> -	 * bits in registers. So read the existing value, preserve the
> -	 * low order 12 bits, and add in the guest physical address
> -	 * (which already has at least the low 12 bits set to zero since
> -	 * it is page aligned). Also set the "enable" bit, which is bit 0.
> +	 * TSC page mapping works differently in root compared to guest.
> +	 * - In guest partition the guest PFN has to be passed to the
> +	 *   hypervisor.
> +	 * - In root partition it's other way around: it has to map the PFN
> +	 *   provided by the hypervisor.
> +	 *   But it can't be mapped right here as it's too early and MMU isn't
> +	 *   ready yet. So, we only set the enable bit here and will remap the
> +	 *   page later in hv_remap_tsc_clocksource().
>  	 */
>  	tsc_msr.as_uint64 = hv_get_register(HV_REGISTER_REFERENCE_TSC);
> +	if (hv_root_partition)
> +		tsc_pfn = tsc_msr.pfn;
> +	else
> +		tsc_pfn = HVPFN_DOWN(virt_to_phys(tsc_page));
>  	tsc_msr.enable = 1;
>  	tsc_msr.pfn = tsc_pfn;
>  	hv_set_register(HV_REGISTER_REFERENCE_TSC, tsc_msr.as_uint64);

There's a subtlety here that was nagging me, and I think I see it now.

At this point, the code has enabled the Reference TSC, and if we're the root
partition, the Reference TSC Page is the page supplied by the hypervisor.
tsc_pfn has been updated to reflect that hypervisor supplied page.

But tsc_page has not been updated to be in sync with tsc_pfn because we
can't do the memremap() here.  tsc_page still points to tsc_pg, which is a
global variable in Linux.  tsc_page and tsc_pfn will be out-of- sync until
hv_remap_tsc_clocksource() is called later in the boot process.  During
this interval, calls to get the Hyper-V Reference TSC value will use tsc_pg,
not on the Reference TSC Page that the hypervisor is using.  Fortunately,
the function hv_read_tsc_page_tsc(), which actually reads the Reference
TSC Page, treats a zero value for tsc_sequence as a special case meaning
that the Reference TSC page isn't valid.  read_hv_clock_tsc() then falls
back to reading a hypervisor provided synthetic MSR to get the correct
Reference TSC value.  That fallback is fine -- it's just slower because it
traps to the hypervisor.  And the fallback will no longer be used once 
tsc_page is updated by hv_remap_tsc_clocksource().

So the code works. Presumably this subtlety was already understood, but
it really should be called out in a comment, as it is far from obvious.  I
know this code pretty well and I just figured it out. :-(

Michael

> @@ -573,3 +576,20 @@ void __init hv_init_clocksource(void)
>  	hv_sched_clock_offset = hv_read_reference_counter();
>  	hv_setup_sched_clock(read_hv_sched_clock_msr);
>  }
> +
> +void __init hv_remap_tsc_clocksource(void)
> +{
> +	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
> +		return;
> +
> +	if (!hv_root_partition) {
> +		WARN(1, "%s: attempt to remap TSC page in guest partition\n",
> +		     __func__);
> +		return;
> +	}
> +
> +	tsc_page = memremap(tsc_pfn << HV_HYP_PAGE_SHIFT, sizeof(tsc_pg),
> +			    MEMREMAP_WB);
> +	if (!tsc_page)
> +		pr_err("Failed to remap Hyper-V TSC page.\n");
> +}
> diff --git a/include/clocksource/hyperv_timer.h
> b/include/clocksource/hyperv_timer.h
> index 3078d23faaea..783701a2102d 100644
> --- a/include/clocksource/hyperv_timer.h
> +++ b/include/clocksource/hyperv_timer.h
> @@ -31,6 +31,7 @@ extern void hv_stimer_global_cleanup(void);
>  extern void hv_stimer0_isr(void);
> 
>  extern void hv_init_clocksource(void);
> +extern void hv_remap_tsc_clocksource(void);
> 
>  extern unsigned long hv_get_tsc_pfn(void);
>  extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ