lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 31 Jan 2010 09:29:53 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Hitoshi Mitake <mitake@....info.waseda.ac.jp>
Cc:	linux-kernel@...r.kernel.org,
	Fr??d??ric Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Paul Mackerras <paulus@...ba.org>,
	Anton Blanchard <anton@...ba.org>,
	Mike Galbraith <efault@....de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 00/12 v2] perf lock: New subcommand "perf lock", for
 analyzing lock statistics


FYI, i've applied a file/line-less version of 'perf lock' to perf/core today.

The basic workflow is the usual:

  perf lock record sleep 1     # or some other command
  perf lock report             # or 'perf lock trace'

[ I think we can do all the things that file/line can do with a less intrusive 
  (and more standard) call-site-IP based approach. For now we can key off the 
  names of the locks, that's coarser but also informative and allows us to 
  progress.

  I've renamed 'perf lock prof' to 'perf lock report' - which is more in line 
  with other perf tools. ]

The tool clearly needs more work at the moment: i have tried perf lock on a 16 
cpus box, and it was very slow, while it didnt really record all that many 
events to justify the slowdown. A simple:

  perf lock record sleep 1

makes the system very slow and requires a Ctrl-C to stop:

 # time perf lock record sleep 1
 ^C[ perf record: Woken up 0 times to write data ]
 [ perf record: Captured and wrote 5.204 MB perf.data (~227374 samples) ]

  real	0m11.941s
  user	0m0.020s
  sys	0m11.661s

(The kernel config i used witht that is attached.)

My suspicion is that the overhead of CONFIG_LOCK_STAT based tracing is way too 
high at the moment, and needs to be reduced. I have removed the '-R' option 
from perf lock record (which it got from perf sched where it makes sense but 
here it's not really needed and -R further increases overhead), but that has 
not solved the slowdown.

'top' shows this kind of messy picture of a high-overhead system:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                              
15003 root      20   0 28900 1116  784 R 102.2  0.0   0:14.23 perf                                
14981 mingo     20   0 15068 1396  880 R 85.7  0.0   0:11.67 top                                  
15036 nobody    30  10  120m  22m 3224 R 74.4  0.1   0:04.47 cc1                                  
15030 nobody    30  10  125m  27m 2744 R 64.6  0.1   0:03.88 cc1                                  
   20 root      20   0     0    0    0 R 53.4  0.0   0:15.04 ksoftirqd/8                          
 7646 nobody    30  10 23488  720  264 S 51.9  0.0   0:04.96 distccd                              
   43 root      20   0     0    0    0 R 50.2  0.0   0:06.26 events/8                             
15037 nobody    30  10 15696 4248  876 S 39.3  0.0   0:02.36 as                                   
 2891 nobody    30  10 23488  764  264 R 24.8  0.0   0:03.26 distccd            

A couple of other details i noticed:

 - 'perf lock' does not show up in the list of commands if one types 'perf'.
   You can fix this by adding it to command-list.txt.

 - i think we should add a reference to 'perf lock' in the config 
   LOCK_STAT entry's help text in lib/Kconfig.debug - so that people can see 
   what tool to use with lock statistics.

 - perf report should be used to further reduce the overhead of 
   CONFIG_LOCK_STAT. If we want people to use this to tune for performance, we 
   want it to be exceptionally fast. Both the perf-lock-running and
   perf-lock-not-running cases should be optimized. (Perhaps dynamic patching
   techniques should be considered as well.)

 - we waste 30 characters for the 'ID' column in perf lock report, which is in 
   all hexa, while no human would ever really read it, and would rarely rely 
   on it as well. Should be removed and only added back on option request or 
   so.

Thanks,

	Ingo

View attachment "config" of type "text/plain" (79818 bytes)

Powered by blists - more mailing lists