[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BC2C07B.4040607@redhat.com>
Date: Mon, 12 Apr 2010 09:40:59 +0300
From: Avi Kivity <avi@...hat.com>
To: "Zhang, Xiantao" <xiantao.zhang@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Marcelo Tosatti <mtosatti@...hat.com>,
"Yang, Xiaowei" <xiaowei.yang@...el.com>,
"Dong, Eddie" <eddie.dong@...el.com>, "Li, Xin" <xin.li@...el.com>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
Mike Galbraith <efault@....de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: VM performance issue in KVM guests.
On 04/12/2010 05:04 AM, Zhang, Xiantao wrote:
>
>> What was the performance hit? What was your I/O setup (image format,
>> using aio?)
>>
> The issue only happens when vcpu number is over-committed(e.g. vcpu/pcpu>2) and physical cpus are saturated. For example, when run webbench in windows OS in this case, its performance drops by 80%. In our experiment, we are using image file through virtio, and I think aio should be used by default also.
>
Is this on a machine that does pause-loop exits? The current handing of
PLE is very suboptimal. With proper directed yield we should be much
better there.
Without PLE, we need paravirtualized spinlocks, no way around it.
>>> After analysis about Linux scheduler, we found it is indeed caused
>>> by the known features of Linux schduler, such as AFFINE_WAKEUPS,
>>> SYNC_WAKEUPS etc. With these features on, linux schduler often tries
>>> to schedule the vcpu threads of one guests to one same logical
>>> processor when vcpus are over-committed and logical processors are
>>> saturated. Once the vcpu threads of one VM are scheduled to the same
>>> LP, system performance drops dramatically with some workloads(like
>>> webbench running in windows OS).
>>>
>>>
>> Were the affine wakeups due to the kernel (emulated guest IPIs) or
>> qemu?
>>
> We have basic guesses about the reasone, one is wakeup affinity between vcpu threads due to IPI, and the other is wakeup affinity between io theads and vcpu threads.
>
It would be good to find out.
>> Most likely it also hits non-virtualized loads as well. If the
>> scheduler pulls two long-running threads to the same cpu, performance
>> will take a hit.
>>
> Since the hit only happens when physical cpus are saturated, and sheduling non-virtualized multiple threads of one process to same processor can benefit the performance due to cache share or other affinities, but you know it hurts performance a lot once schedule two vcpu theads to a same processor due to mutual spin-lock in guests.
>
Spin loops need to be addressed first, they are known to kill
performance in overcommit situations.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists