[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e35c408-e294-ecb6-d927-ba5e9ca4f41e@oracle.com>
Date: Fri, 10 Apr 2020 00:56:52 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Jürgen Groß <jgross@...e.com>,
linux-kernel@...r.kernel.org, x86@...nel.org
Cc: peterz@...radead.org, hpa@...or.com, jpoimboe@...hat.com,
namit@...are.com, mhiramat@...nel.org, bp@...en8.de,
vkuznets@...hat.com, pbonzini@...hat.com,
boris.ostrovsky@...cle.com, mihai.carabas@...cle.com,
kvm@...r.kernel.org, xen-devel@...ts.xenproject.org,
virtualization@...ts.linux-foundation.org
Subject: Re: [RFC PATCH 00/26] Runtime paravirt patching
So, first thanks for the quick comments even though some of my choices
were straight NAKs (or maybe because of that!)
Second, I clearly did a bad job of motivating the series. Let me try
to address the motivation comments first and then I can address the
technical concerns separately.
[ I'm collating all the motivation comments below. ]
>> A KVM host (or another hypervisor) might advertise paravirtualized
>> features and optimization hints (ex KVM_HINTS_REALTIME) which might
>> become stale over the lifetime of the guest. For instance, the
Thomas> If your host changes his advertised behaviour then you want to
Thomas> fix the host setup or find a competent admin.
Juergen> Then this hint is wrong if it can't be guaranteed.
I agree, the hint behaviour is wrong and the host shouldn't be giving
hints it can only temporarily honor.
The host problem is hard to fix though: the behaviour change is
either because of a guest migration or in case of a hosted guest,
cloud economics -- customers want to go to a 2-1 or worse VCPU-CPU
ratio at times of low load.
I had an offline discussion with Paolo Bonzini where he agreed that
it makes sense to make KVM_HINTS_REALTIME a dynamic hint rather than
static as it is now. (That was really the starting point for this
series.)
>> host might go from being undersubscribed to being oversubscribed
>> (or the other way round) and it would make sense for the guest
>> switch pv-ops based on that.
Juergen> I think using pvops for such a feature change is just wrong.
Juergen> What comes next? Using pvops for being able to migrate a guest
Juergen> from an Intel to an AMD machine?
My statement about switching pv-ops was too broadly worded. What
I meant to say was that KVM guests choose pv_lock_ops to be native
or paravirt based on undersubscribed/oversubscribed hint at boot,
and this choice should be available at run-time as well.
KVM chooses between native/paravirt spinlocks at boot based on this
reasoning (from commit b2798ba0b8):
"Waiman Long mentioned that:
> Generally speaking, unfair lock performs well for VMs with a small
> number of vCPUs. Native qspinlock may perform better than pvqspinlock
> if there is vCPU pinning and there is no vCPU over-commitment.
"
PeterZ> So what, the paravirt spinlock stuff works just fine when
PeterZ> you're not oversubscribed.
Yeah, the paravirt spinlocks work fine for both under and oversubscribed
hosts, but they are more expensive and that extra cost provides no benefits
when CPUs are pinned.
For instance, pvqueued spin_unlock() is a call+locked cmpxchg as opposed
to just a movb $0, (%rdi).
This difference shows up in kernbench running on a KVM guest with native
and paravirt spinlocks. I ran with 8 and 64 CPU guests with CPUs pinned.
The native version performs same or better.
8 CPU Native (std-dev) Paravirt (std-dev)
----------------- -----------------
-j 4: sys 151.89 ( 0.2462) 160.14 ( 4.8366) +5.4%
-j 32: sys 162.715 (11.4129) 170.225 (11.1138) +4.6%
-j 0: sys 164.193 ( 9.4063) 170.843 ( 8.9651) +4.0%
64 CPU Native (std-dev) Paravirt (std-dev)
----------------- -----------------
-j 32: sys 209.448 (0.37009) 210.976 (0.4245) +0.7%
-j 256: sys 267.401 (61.0928) 285.73 (78.8021) +6.8%
-j 0: sys 286.313 (56.5978) 307.721 (70.9758) +7.4%
In all cases the pv_kick, pv_wait numbers were minimal as expected.
The lock_slowpath counts were higher with PV but AFAICS the native
and paravirt lock_slowpath are not directly comparable.
Detailed kernbench numbers attached.
Thanks
Ankur
View attachment "8-cpus.txt" of type "text/plain" (1340 bytes)
View attachment "64-cpus.txt" of type "text/plain" (1414 bytes)
Powered by blists - more mailing lists