linux-kernel - Re: [RFC PATCH 0/4] Gang scheduling in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F048295.1050907@redhat.com>
Date:	Wed, 04 Jan 2012 11:47:17 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Avi Kivity <avi@...hat.com>
CC:	Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>, peterz@...radead.org,
	linux-kernel@...r.kernel.org, vatsa@...ux.vnet.ibm.com,
	bharata@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS

On 01/04/2012 09:41 AM, Avi Kivity wrote:
> On 01/04/2012 12:52 PM, Nikunj A Dadhania wrote:
>> On Mon, 02 Jan 2012 11:37:22 +0200, Avi Kivity<avi@...hat.com>  wrote:
>>> On 12/31/2011 04:21 AM, Nikunj A Dadhania wrote:
>>>>
>>>>      GangV2:
>>>>      27.45%       ebizzy  libc-2.12.so            [.] __memcpy_ssse3_back
>>>>      12.12%       ebizzy  [kernel.kallsyms]       [k] clear_page
>>>>       9.22%       ebizzy  [kernel.kallsyms]       [k] __do_page_fault
>>>>       6.91%       ebizzy  [kernel.kallsyms]       [k] flush_tlb_others_ipi
>>>>       4.06%       ebizzy  [kernel.kallsyms]       [k] get_page_from_freelist
>>>>       4.04%       ebizzy  [kernel.kallsyms]       [k] ____pagevec_lru_add
>>>>
>>>>      GangBase:
>>>>      45.08%       ebizzy  [kernel.kallsyms]       [k] flush_tlb_others_ipi
>>>>      15.38%       ebizzy  libc-2.12.so            [.] __memcpy_ssse3_back
>>>>       7.00%       ebizzy  [kernel.kallsyms]       [k] clear_page
>>>>       4.88%       ebizzy  [kernel.kallsyms]       [k] __do_page_fault
>>>
>>> Looping in flush_tlb_others().  Rik, what trace an we run to find out
>>> why PLE directed yield isn't working as expected?
>>>
>> I tried some experiments by adding a pause_loop_exits stat in the
>> kvm_vpu_stat.
>
> (that's deprecated, we use tracepoints these days for stats)
>
>> Here are some observation related to Baseline-only(8vm case)
>>
>>                | ple_gap=128 | ple_gap=64 | ple_gap=256 | ple_window=2048
>> --------------+-------------+------------+-------------+----------------
>> EbzyRecords/s |    2247.50  |    2132.75 |    2086.25  |      1835.62
>> PauseExits    | 7928154.00  | 6696342.00 | 7365999.00  |  50319582.00
>>
>> With ple_window = 2048, PauseExits is more than 6times the default case
>
> So it looks like the default is optimal, at least wrt the cases you
> tested and your test workload.

It depends on the workload.

I believe ebizzy synchronously bounces messages around between
userland threads, and may benefit from lower latency preemption
and re-scheduling.

Workloads like AMQP do asynchronous messaging, and are likely
to benefit from having a lower number of switches.

I do not know which kind of workload is more prevalent.

Another worry with gang scheduling is scalability.  One of
the reasons Linux scales well to larger systems is that a
lot of things are done CPU local, without communicating
things with other CPUs.  Making the scheduling algorithm
system-global has the potential to add in a lot of overhead.

Likewise, removing the ability to migrate workloads to idle
CPUs is likely to hurt a lot of real world workloads.

Benchmarks don't care, because they run full-out. However,
users do not run benchmarks nearly as much as they run
actual workloads...

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/