[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EF701C7.9080907@redhat.com>
Date: Sun, 25 Dec 2011 12:58:15 +0200
From: Avi Kivity <avi@...hat.com>
To: Ingo Molnar <mingo@...e.hu>
CC: Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
peterz@...radead.org, linux-kernel@...r.kernel.org,
vatsa@...ux.vnet.ibm.com, bharata@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS
On 12/23/2011 12:36 PM, Ingo Molnar wrote:
> * Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com> wrote:
>
> > Here some interesting perf reports from inside the guest:
> >
> > Baseline:
> > 29.79% ebizzy [kernel.kallsyms] [k] native_flush_tlb_others
> > 18.70% ebizzy libc-2.12.so [.] __GI_memcpy
> > 7.23% ebizzy [kernel.kallsyms] [k] get_page_from_freelist
> > 5.38% ebizzy [kernel.kallsyms] [k] __do_page_fault
> > 4.50% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add
> > 3.58% ebizzy [kernel.kallsyms] [k] default_send_IPI_mask_logical
> > 3.26% ebizzy [kernel.kallsyms] [k] native_flush_tlb_single
> > 2.82% ebizzy [kernel.kallsyms] [k] handle_pte_fault
> > 2.16% ebizzy [kernel.kallsyms] [k] kunmap_atomic
> > 2.10% ebizzy [kernel.kallsyms] [k] _spin_unlock_irqrestore
> > 1.90% ebizzy [kernel.kallsyms] [k] down_read_trylock
> > 1.65% ebizzy [kernel.kallsyms] [k] __mem_cgroup_commit_charge.clone.4
> > 1.60% ebizzy [kernel.kallsyms] [k] up_read
> > 1.24% ebizzy [kernel.kallsyms] [k] __alloc_pages_nodemask
> >
> > Gang:
> > 22.53% ebizzy libc-2.12.so [.] __GI_memcpy
> > 9.73% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add
> > 8.22% ebizzy [kernel.kallsyms] [k] get_page_from_freelist
> > 7.80% ebizzy [kernel.kallsyms] [k] default_send_IPI_mask_logical
> > 7.68% ebizzy [kernel.kallsyms] [k] native_flush_tlb_others
> > 6.22% ebizzy [kernel.kallsyms] [k] __do_page_fault
> > 5.54% ebizzy [kernel.kallsyms] [k] native_flush_tlb_single
> > 4.44% ebizzy [kernel.kallsyms] [k] _spin_unlock_irqrestore
> > 2.90% ebizzy [kernel.kallsyms] [k] kunmap_atomic
> > 2.78% ebizzy [kernel.kallsyms] [k] __mem_cgroup_commit_charge.clone.4
> > 2.76% ebizzy [kernel.kallsyms] [k] handle_pte_fault
> > 2.16% ebizzy [kernel.kallsyms] [k] __mem_cgroup_uncharge_common
> > 1.59% ebizzy [kernel.kallsyms] [k] down_read_trylock
> > 1.43% ebizzy [kernel.kallsyms] [k] up_read
> >
> > I see the main difference between both the reports is:
> > native_flush_tlb_others.
>
> So it would be important to figure out why ebizzy gets into so
> many TLB flushes and why gang scheduling makes it go away.
The second part is easy - a remote tlb flush involves IPIs to many other
vcpus (possible waking them up and scheduling them), then busy-waiting
until they acknowledge the flush. Gang scheduling is really good here
since it shortens the busy wait, would be even better if we schedule
halted vcpus (see the yield_on_hlt module parameter, set to 0).
Directed yield on PLE should provide intermediate results between doing
nothing and gang sched.
The first part appears to be unrelated to ebizzy itself - it's the
kunmap_atomic() flushing ptes. It could be eliminated by switching to a
non-highmem kernel, or by allocating more PTEs for kmap_atomic() and
batching the flush.
btw you can get an additional speedup by enabling x2apic, for
default_send_IPI_mask_logical().
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists