lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FBC2D350-E001-48C2-A4B7-0532FFD54531@gmail.com>
Date:   Tue, 23 Oct 2018 13:32:21 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Dave Hansen <dave.hansen@...el.com>, Nadav Amit <namit@...are.com>,
        Ingo Molnar <mingo@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        Borislav Petkov <bp@...en8.de>,
        David Woodhouse <dwmw@...zon.co.uk>
Subject: Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion

at 11:36 AM, Dave Hansen <dave.hansen@...el.com> wrote:

> On 10/17/18 5:54 PM, Nadav Amit wrote:
>> base		relpoline
>> 		----		---------
>> nginx 	22898 		25178 (+10%)
>> redis-ycsb	24523		25486 (+4%)
>> dbench	2144		2103 (+2%)
> 
> Just out of curiosity, which indirect branches are the culprits here for
> causing the slowdowns?

So I didn’t try to measure exactly which one. There are roughly 500 that
actually “run” in my tests. Initially, I took the silly approach of trying
to patch the C source-code using semi automatically-generated Coccinelle
scripts, so I can tell you it is not just few branches but many. The
network stack is full of function pointers (e.g., tcp_congestion_ops,
tcp_sock_af_ops, dst_ops). The file-system also uses many function pointers
(file_operations specifically). Compound-pages have d’tor and so on.

If you want, you can rebuild the kernel without retpolines and run
	
  perf record -e br_inst_exec.taken_indirect_near_call:k (your workload)

For some reason I didn’t manage to use PEBS (:ppp) from either the guest or
the host, so my results are a bit skewed (i.e., the sampled location is
usually after the call was taken). Running dbench in the VM gives me the
following “hot-spots”:

# Samples: 304  of event 'br_inst_exec.taken_indirect_near_call'
# Event count (approx.): 60800912
#
# Overhead  Command  Shared Object            Symbol                                       
# ........  .......  .......................  .............................................
#
     5.26%  :197970  [guest.kernel.kallsyms]  [g] __fget_light
     4.28%  :197969  [guest.kernel.kallsyms]  [g] __fget_light
     3.95%  :197969  [guest.kernel.kallsyms]  [g] dcache_readdir
     3.29%  :197970  [guest.kernel.kallsyms]  [g] next_positive.isra.14
     2.96%  :197970  [guest.kernel.kallsyms]  [g] __do_sys_kill
     2.30%  :197970  [guest.kernel.kallsyms]  [g] apparmor_file_open
     1.97%  :197969  [guest.kernel.kallsyms]  [g] __do_sys_kill
     1.97%  :197969  [guest.kernel.kallsyms]  [g] next_positive.isra.14
     1.97%  :197970  [guest.kernel.kallsyms]  [g] _raw_spin_lock
     1.64%  :197969  [guest.kernel.kallsyms]  [g] __alloc_file
     1.64%  :197969  [guest.kernel.kallsyms]  [g] common_file_perm
     1.64%  :197969  [guest.kernel.kallsyms]  [g] filldir
     1.64%  :197970  [guest.kernel.kallsyms]  [g] do_dentry_open
     1.64%  :197970  [guest.kernel.kallsyms]  [g] kmem_cache_free
     1.32%  :197969  [guest.kernel.kallsyms]  [g] __raw_callee_save___pv_queued_spin_unlock
     1.32%  :197969  [guest.kernel.kallsyms]  [g] __slab_free

Regards,
Nadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ