lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR21MB1688A3CB5CD51A6A6921D926D7A19@BYAPR21MB1688.namprd21.prod.outlook.com>
Date:   Fri, 17 Feb 2023 02:34:21 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
CC:     Stanislav Kinsburskiy <stanislav.kinsburskiy@...il.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Thursday, February 16, 2023 11:41 AM
> 
> On Tue, Feb 14, 2023 at 04:19:13PM +0000, Michael Kelley (LINUX) wrote:
> > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> > >
> > > And have it preset.
> > > This change allows to significantly reduce time to bring up guest SMP
> > > configuration as well as make sure the guest won't get inaccurate
> > > calibration results due to "noisy neighbour" situation.
> > >
> > > Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
> > >
> > > [    0.562938] x86: Booting SMP configuration:
> > > ...
> > > [    1.859447] smp: Brought up 1 node, 16 CPUs
> > >
> > > and after the patch (~130 msec):
> > >
> > > [    0.445079] x86: Booting SMP configuration:
> > > ...
> > > [    0.575035] smp: Brought up 1 node, 16 CPUs
> > >
> > > This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> > > paravirt function to calculate cpu khz").
> >
> > This patch has been nagging at me a bit, and I finally did some further
> > checking.   Looking at Linux guests on local Hyper-V and in Azure, I see
> > a dmesg output line like this during boot:
> >
> > Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.81
> BogoMIPS (lpj=2593905)
> >
> > We're already skipping the delay loop calculation because lpj_fine
> > is set in tsc_init(), using the results of get_loops_per_jiffy().  The
> > latter does exactly the same calculation as hv_preset_lpj() in
> > this patch.
> >
> > Is this patch arising from an environment where tsc_init() is
> > skipped for some reason?  Just trying to make sure we fully
> > when this patch is applicable, and when not.
> >
> 
> The problem here is a bit different: "lpj_fine" is considered only for
> the boot CPU (from init/calibrate.c):
> 
>         } else if ((!printed) && lpj_fine) {
>                 lpj = lpj_fine;
>                 pr_info("Calibrating delay loop (skipped), "
>                         "value calculated using timer frequency.. ");
> 
> while all the secondary ones use the timer to calibrate.
> 
> With this change lpj_preset will be used for all cores (from
> init/calbrate.c):
> 
>         } else if (preset_lpj) {
>                 lpj = preset_lpj;
>                 if (!printed)
>                         pr_info("Calibrating delay loop (skipped) "
>                                 "preset value.. ");
> 
> This lofic with lpj_fine comes from commit 3da757daf86e ("x86: use
> cpu_khz for loops_per_jiffy calculation"), where the commit messages
> states the following:
> 
>     We do this only for the boot processor because the AP's can have
>     different base frequencies or the BIOS might boot a AP at a different
>     frequency.
> 
> Hope this helps.
> 

Indeed, you are right about lpj_fine being applied only to the boot
CPU.  So I've looked a little closer because I don't see the 1300
milliseconds you see for a 16 vCPU guest.

I've been experimenting with a 32 vCPU guest, and without your
patch, it takes only 26 milliseconds to get all 32 vCPUs started.  I
think the trick is in the call to calibrate_delay_is_known().  This
function copies the lpj value from a CPU in the same NUMA node
that has already been calibrated, assuming that constant_tsc is
set, which is the case in my test VM.  So the boot CPU sets lpj
based on lpj_fine, and all other CPUs effectively copy the value
from the boot CPU without doing calibration.

I also experimented with multiple NUMA nodes.  In that case, it
does take a longer.  Dividing the 32 vCPUs into 4 NUMA nodes,
it takes about 210 miliseconds to boot all 32 vCPUs.  Presumably the
extra time is due to timer-based calibration being done once for each
NUMA node, plus probably some misc NUMA accounting overhead.
With preset_lpj set, that 210 milliseconds drops to 32 milliseconds,
which is more like the case with only 1 NUMA nodes, so there's some
modest benefit with multiple NUMA nodes.

Could you check if constant_tsc is set in your test environment?  It
really should be set in a Hyper-V VM.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ