[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FBC2D350-E001-48C2-A4B7-0532FFD54531@gmail.com>
Date: Tue, 23 Oct 2018 13:32:21 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Dave Hansen <dave.hansen@...el.com>, Nadav Amit <namit@...are.com>,
Ingo Molnar <mingo@...hat.com>
Cc: Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
"H . Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Borislav Petkov <bp@...en8.de>,
David Woodhouse <dwmw@...zon.co.uk>
Subject: Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion
at 11:36 AM, Dave Hansen <dave.hansen@...el.com> wrote:
> On 10/17/18 5:54 PM, Nadav Amit wrote:
>> base relpoline
>> ---- ---------
>> nginx 22898 25178 (+10%)
>> redis-ycsb 24523 25486 (+4%)
>> dbench 2144 2103 (+2%)
>
> Just out of curiosity, which indirect branches are the culprits here for
> causing the slowdowns?
So I didn’t try to measure exactly which one. There are roughly 500 that
actually “run” in my tests. Initially, I took the silly approach of trying
to patch the C source-code using semi automatically-generated Coccinelle
scripts, so I can tell you it is not just few branches but many. The
network stack is full of function pointers (e.g., tcp_congestion_ops,
tcp_sock_af_ops, dst_ops). The file-system also uses many function pointers
(file_operations specifically). Compound-pages have d’tor and so on.
If you want, you can rebuild the kernel without retpolines and run
perf record -e br_inst_exec.taken_indirect_near_call:k (your workload)
For some reason I didn’t manage to use PEBS (:ppp) from either the guest or
the host, so my results are a bit skewed (i.e., the sampled location is
usually after the call was taken). Running dbench in the VM gives me the
following “hot-spots”:
# Samples: 304 of event 'br_inst_exec.taken_indirect_near_call'
# Event count (approx.): 60800912
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................... .............................................
#
5.26% :197970 [guest.kernel.kallsyms] [g] __fget_light
4.28% :197969 [guest.kernel.kallsyms] [g] __fget_light
3.95% :197969 [guest.kernel.kallsyms] [g] dcache_readdir
3.29% :197970 [guest.kernel.kallsyms] [g] next_positive.isra.14
2.96% :197970 [guest.kernel.kallsyms] [g] __do_sys_kill
2.30% :197970 [guest.kernel.kallsyms] [g] apparmor_file_open
1.97% :197969 [guest.kernel.kallsyms] [g] __do_sys_kill
1.97% :197969 [guest.kernel.kallsyms] [g] next_positive.isra.14
1.97% :197970 [guest.kernel.kallsyms] [g] _raw_spin_lock
1.64% :197969 [guest.kernel.kallsyms] [g] __alloc_file
1.64% :197969 [guest.kernel.kallsyms] [g] common_file_perm
1.64% :197969 [guest.kernel.kallsyms] [g] filldir
1.64% :197970 [guest.kernel.kallsyms] [g] do_dentry_open
1.64% :197970 [guest.kernel.kallsyms] [g] kmem_cache_free
1.32% :197969 [guest.kernel.kallsyms] [g] __raw_callee_save___pv_queued_spin_unlock
1.32% :197969 [guest.kernel.kallsyms] [g] __slab_free
Regards,
Nadav
Powered by blists - more mailing lists