[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F048295.1050907@redhat.com>
Date: Wed, 04 Jan 2012 11:47:17 -0500
From: Rik van Riel <riel@...hat.com>
To: Avi Kivity <avi@...hat.com>
CC: Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...e.hu>, peterz@...radead.org,
linux-kernel@...r.kernel.org, vatsa@...ux.vnet.ibm.com,
bharata@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS
On 01/04/2012 09:41 AM, Avi Kivity wrote:
> On 01/04/2012 12:52 PM, Nikunj A Dadhania wrote:
>> On Mon, 02 Jan 2012 11:37:22 +0200, Avi Kivity<avi@...hat.com> wrote:
>>> On 12/31/2011 04:21 AM, Nikunj A Dadhania wrote:
>>>>
>>>> GangV2:
>>>> 27.45% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
>>>> 12.12% ebizzy [kernel.kallsyms] [k] clear_page
>>>> 9.22% ebizzy [kernel.kallsyms] [k] __do_page_fault
>>>> 6.91% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
>>>> 4.06% ebizzy [kernel.kallsyms] [k] get_page_from_freelist
>>>> 4.04% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add
>>>>
>>>> GangBase:
>>>> 45.08% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
>>>> 15.38% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
>>>> 7.00% ebizzy [kernel.kallsyms] [k] clear_page
>>>> 4.88% ebizzy [kernel.kallsyms] [k] __do_page_fault
>>>
>>> Looping in flush_tlb_others(). Rik, what trace an we run to find out
>>> why PLE directed yield isn't working as expected?
>>>
>> I tried some experiments by adding a pause_loop_exits stat in the
>> kvm_vpu_stat.
>
> (that's deprecated, we use tracepoints these days for stats)
>
>> Here are some observation related to Baseline-only(8vm case)
>>
>> | ple_gap=128 | ple_gap=64 | ple_gap=256 | ple_window=2048
>> --------------+-------------+------------+-------------+----------------
>> EbzyRecords/s | 2247.50 | 2132.75 | 2086.25 | 1835.62
>> PauseExits | 7928154.00 | 6696342.00 | 7365999.00 | 50319582.00
>>
>> With ple_window = 2048, PauseExits is more than 6times the default case
>
> So it looks like the default is optimal, at least wrt the cases you
> tested and your test workload.
It depends on the workload.
I believe ebizzy synchronously bounces messages around between
userland threads, and may benefit from lower latency preemption
and re-scheduling.
Workloads like AMQP do asynchronous messaging, and are likely
to benefit from having a lower number of switches.
I do not know which kind of workload is more prevalent.
Another worry with gang scheduling is scalability. One of
the reasons Linux scales well to larger systems is that a
lot of things are done CPU local, without communicating
things with other CPUs. Making the scheduling algorithm
system-global has the potential to add in a lot of overhead.
Likewise, removing the ability to migrate workloads to idle
CPUs is likely to hurt a lot of real world workloads.
Benchmarks don't care, because they run full-out. However,
users do not run benchmarks nearly as much as they run
actual workloads...
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists