lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EF701C7.9080907@redhat.com>
Date:	Sun, 25 Dec 2011 12:58:15 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
	peterz@...radead.org, linux-kernel@...r.kernel.org,
	vatsa@...ux.vnet.ibm.com, bharata@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS

On 12/23/2011 12:36 PM, Ingo Molnar wrote:
> * Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com> wrote:
>
> > Here some interesting perf reports from inside the guest:
> > 
> > Baseline:
> >   29.79%   ebizzy  [kernel.kallsyms]   [k] native_flush_tlb_others
> >   18.70%   ebizzy  libc-2.12.so        [.] __GI_memcpy
> >    7.23%   ebizzy  [kernel.kallsyms]   [k] get_page_from_freelist
> >    5.38%   ebizzy  [kernel.kallsyms]   [k] __do_page_fault
> >    4.50%   ebizzy  [kernel.kallsyms]   [k] ____pagevec_lru_add
> >    3.58%   ebizzy  [kernel.kallsyms]   [k] default_send_IPI_mask_logical
> >    3.26%   ebizzy  [kernel.kallsyms]   [k] native_flush_tlb_single
> >    2.82%   ebizzy  [kernel.kallsyms]   [k] handle_pte_fault
> >    2.16%   ebizzy  [kernel.kallsyms]   [k] kunmap_atomic
> >    2.10%   ebizzy  [kernel.kallsyms]   [k] _spin_unlock_irqrestore
> >    1.90%   ebizzy  [kernel.kallsyms]   [k] down_read_trylock
> >    1.65%   ebizzy  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge.clone.4
> >    1.60%   ebizzy  [kernel.kallsyms]   [k] up_read
> >    1.24%   ebizzy  [kernel.kallsyms]   [k] __alloc_pages_nodemask
> > 
> > Gang:
> >   22.53%   ebizzy  libc-2.12.so       [.] __GI_memcpy
> >    9.73%   ebizzy  [kernel.kallsyms]  [k] ____pagevec_lru_add
> >    8.22%   ebizzy  [kernel.kallsyms]  [k] get_page_from_freelist
> >    7.80%   ebizzy  [kernel.kallsyms]  [k] default_send_IPI_mask_logical
> >    7.68%   ebizzy  [kernel.kallsyms]  [k] native_flush_tlb_others
> >    6.22%   ebizzy  [kernel.kallsyms]  [k] __do_page_fault
> >    5.54%   ebizzy  [kernel.kallsyms]  [k] native_flush_tlb_single
> >    4.44%   ebizzy  [kernel.kallsyms]  [k] _spin_unlock_irqrestore
> >    2.90%   ebizzy  [kernel.kallsyms]  [k] kunmap_atomic
> >    2.78%   ebizzy  [kernel.kallsyms]  [k] __mem_cgroup_commit_charge.clone.4
> >    2.76%   ebizzy  [kernel.kallsyms]  [k] handle_pte_fault
> >    2.16%   ebizzy  [kernel.kallsyms]  [k] __mem_cgroup_uncharge_common
> >    1.59%   ebizzy  [kernel.kallsyms]  [k] down_read_trylock
> >    1.43%   ebizzy  [kernel.kallsyms]  [k] up_read
> > 
> > I see the main difference between both the reports is:
> > native_flush_tlb_others.
>
> So it would be important to figure out why ebizzy gets into so 
> many TLB flushes and why gang scheduling makes it go away.

The second part is easy - a remote tlb flush involves IPIs to many other
vcpus (possible waking them up and scheduling them), then busy-waiting
until they acknowledge the flush.  Gang scheduling is really good here
since it shortens the busy wait, would be even better if we schedule
halted vcpus (see the yield_on_hlt module parameter, set to 0). 
Directed yield on PLE should provide intermediate results between doing
nothing and gang sched.

The first part appears to be unrelated to ebizzy itself - it's the
kunmap_atomic() flushing ptes.  It could be eliminated by switching to a
non-highmem kernel, or by allocating more PTEs for kmap_atomic() and
batching the flush.

btw you can get an additional speedup by enabling x2apic, for
default_send_IPI_mask_logical().

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ