[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180403100003.GE2874@rkaganb.sw.ru>
Date: Tue, 3 Apr 2018 13:00:04 +0300
From: Roman Kagan <rkagan@...tuozzo.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>,
Denis Plotnikov <dplotnikov@...tuozzo.com>
Cc: kvm@...r.kernel.org, x86@...nel.org,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"Michael Kelley (EOSG)" <Michael.H.Kelley@...rosoft.com>,
Mohammed Gamal <mmorsy@...hat.com>,
Cathy Avery <cavery@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/5] KVM: x86: hyperv: PV TLB flush for Windows guests
On Mon, Apr 02, 2018 at 06:10:54PM +0200, Vitaly Kuznetsov wrote:
> This is both a new feature and a bugfix.
>
> Bugfix description:
>
> It was found that Windows 2016 guests on KVM crash when they have > 64
> vCPUs, non-flat topology (>1 core/thread per socket; in case it has >64
> sockets Windows just ignores vCPUs above 64) and Hyper-V enlightenments
> (any) are enabled. The most common error reported is "PAGE FAULT IN
> NONPAGED AREA" but I saw different messages. Apparently, Windows doesn't
> expect to run on a Hyper-V server without PV TLB flush support as there's
> no such Hyper-V servers out there (it's only WS2016 supporting > 64 vCPUs
> AFAIR).
>
> Adding PV TLB flush support to KVM helps, Windows 2016 guests now boot
> normally (I tried '-smp 128,sockets=64,cores=1,threads=2' and
> '-smp 128,sockets=8,cores=16,threads=1' but other topologies should work
> too).
>
> Feature description:
>
> PV TLB flush helps a lot when running overcommited. KVM gained support for
> it recently but it is only available for Linux guests. Windows guests use
> emulated Hyper-V interface and PV TLB flush needs to be added there.
>
> I tested WS2016 guest with 128 vCPUs running on a 12 pCPU server. The test
> was running 64 threads doing 100 mmap()/munmap() for 16384 pages with a
> tiny random nanosleep in between (I used Cygwin. It would be great if
> someone could point me to a good Windows-native TLB trashing test).
>
> The results are:
> Before:
> real 0m44.362s
> user 0m1.796s
> sys 6m43.218s
>
> After:
> real 0m24.425s
> user 0m1.811s
> sys 0m40.625s
>
> When running without overcommit (single 12 vCPU guest on 12 pCPU server) the
> results of the same test are very close:
> Before:
> real 0m21.237s
> user 0m1.531s
> sys 0m19.984s
>
> After:
> real 0m21.082s
> user 0m1.546s
> sys 0m20.030s
I vaguely remember Denis Plotnikov (cc-d) did a similar attempt a couple
of years ago. IIRC the outcome was that win2012r2 (back then) guests
started to also use this mechanism for local tlb flushes via self-IPI,
which led to noticable degradation on certain workloads.
Denis do you have any details to share?
Thanks,
Roman.
Powered by blists - more mailing lists