lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 17 Apr 2015 14:42:27 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	John Stultz <john.stultz@...aro.org>
Cc:	Marcelo Tosatti <mtosatti@...hat.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Gleb Natapov <gleb@...nel.org>, kvm list <kvm@...r.kernel.org>,
	Ralf Baechle <ralf@...ux-mips.org>,
	Andrew Lutomirski <luto@...nel.org>
Subject: Re: [GIT PULL] First batch of KVM changes for 4.1

On Fri, Apr 17, 2015 at 2:28 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@...capital.net> wrote:
>>
>> On my box, vclock_gettime using kvm-clock is about 40 ns.  An empty
>> syscall is about 33 ns.  clock_gettime *should* be around 17 ns.
>>
>> The clock_gettime syscall is about 73 ns.
>>
>> Could we figure out why clock_gettime (the syscall) is so slow
>
> If we only could profile some random program (let's call it "a.out"
> that did the syscall(__NR_gettime_syscall) a couple million times.
>
> Oh, lookie here, Santa came around:
>
>   21.83%   [k] system_call
>   12.85%   [.] syscall
>    9.76%   [k] __audit_syscall_exit
>    9.55%   [k] copy_user_enhanced_fast_string
>    4.68%   [k] __getnstimeofday64
>    4.08%   [k] syscall_trace_enter_phase1
>    3.85%   [k] __audit_syscall_entry
>    3.77%   [k] unroll_tree_refs
>    3.15%   [k] sys_clock_gettime
>    2.92%   [k] int_very_careful
>    2.73%   [.] main
>    2.35%   [k] syscall_trace_leave
>    2.28%   [k] read_tsc
>    1.73%   [k] int_restore_rest
>    1.73%   [k] int_with_check
>    1.48%   [k] syscall_return
>    1.32%   [k] dput
>    1.24%   [k] system_call_fastpath
>    1.21%   [k] syscall_return_via_sysret
>    1.21%   [k] tracesys
>    0.81%   [k] do_audit_syscall_entry
>    0.80%   [k] current_kernel_time
>    0.73%   [k] getnstimeofday64
>    0.68%   [k] path_put
>    0.66%   [k] posix_clock_realtime_get
>    0.61%   [k] int_careful
>    0.60%   [k] mntput
>    0.49%   [k] kfree
>    0.36%   [k] _copy_to_user
>    0.31%   [k] int_ret_from_sys_call_irqs_off
>
> looks to me like it's spending a lot of time in system call auditing.
> Which makes no sense to me, since none of this should be triggering
> any auditing. And there's a lot of time in low-level kernel system
> call assembly code.

Muahaha.  The auditors have invaded your system.  (I did my little
benchmark with a more sensible configuration -- see way below).

Can you send the output of:

# auditctl -s
# auditctl -l

Are you, perchance, using Fedora?  I lobbied rather heavily, and
successfully, to get the default configuration to stop auditing.
Unfortunately, the fix wasn't retroactive, so, unless you have a very
fresh install, you might want to apply the fix yourself:

https://fedorahosted.org/fesco/ticket/1311

>
> If I only remembered the name of the crazy person who said "Ok" when I
> suggest he just be the maintainer of the code since he has spent a lot
> of time sending patches for it. Something like Amdy Letorsky. No, that
> wasn't it. Hmm. It's on the tip of my tongue.
>
> Oh well. Maybe somebody can remember the guys name. It's familiar for
> some reason. Andy?

Amdy Lumirtowsky thinks he meant to attach a condition to his
maintainerish activities: he will do his best to keep the audit code
*out* of the low-level stuff, but he's going to try to avoid ever
touching the audit code itself, because if he ever had to change it,
he might accidentally delete the entire file.

Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
point?  I would love auditing to set some really loud global warning
that you've just done a Bad Thing (tm) performance-wise by enabling
it.

Back to timing.  With kvm-clock, I see:

  23.80%  timing_test_64  [kernel.kallsyms]   [k] pvclock_clocksource_read
  15.57%  timing_test_64  libc-2.20.so        [.] syscall
  12.39%  timing_test_64  [kernel.kallsyms]   [k] system_call
  12.35%  timing_test_64  [kernel.kallsyms]   [k] copy_user_generic_string
  10.95%  timing_test_64  [kernel.kallsyms]   [k] system_call_after_swapgs
   7.35%  timing_test_64  [kernel.kallsyms]   [k] ktime_get_ts64
   6.20%  timing_test_64  [kernel.kallsyms]   [k] sys_clock_gettime
   3.62%  timing_test_64  [kernel.kallsyms]   [k] system_call_fastpath
   2.08%  timing_test_64  timing_test_64      [.] main
   1.72%  timing_test_64  timing_test_64      [.] syscall@plt
   1.58%  timing_test_64  [kernel.kallsyms]   [k] kvm_clock_get_cycles
   1.22%  timing_test_64  [kernel.kallsyms]   [k] _copy_to_user
   0.65%  timing_test_64  [kernel.kallsyms]   [k] posix_ktime_get_ts
   0.13%  timing_test_64  [kernel.kallsyms]   [k] apic_timer_interrupt

We've got some silly indirection, a uaccess that probably didn't get
optimized very well, and the terrifying function
pvclock_clocksource_read.

By comparison, using tsc:

  19.51%  timing_test_64  libc-2.20.so       [.] syscall
  15.52%  timing_test_64  [kernel.kallsyms]  [k] system_call
  15.25%  timing_test_64  [kernel.kallsyms]  [k] copy_user_generic_string
  14.34%  timing_test_64  [kernel.kallsyms]  [k] system_call_after_swapgs
   8.66%  timing_test_64  [kernel.kallsyms]  [k] ktime_get_ts64
   6.95%  timing_test_64  [kernel.kallsyms]  [k] sys_clock_gettime
   5.93%  timing_test_64  [kernel.kallsyms]  [k] native_read_tsc
   5.12%  timing_test_64  [kernel.kallsyms]  [k] system_call_fastpath
   2.62%  timing_test_64  timing_test_64     [.] main

That's better, although the uaccess silliness is still there.

(No PEBS -- I don't think it works right in KVM.)

--Andy

>
>              Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ