lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 20 Jul 2014 23:55:44 +0200
From:	Jiri Olsa <>
Cc:	Arnaldo Carvalho de Melo <>,
	Corey Ashford <>,
	David Ahern <>,
	Frederic Weisbecker <>,
	Ingo Molnar <>,
	Jean Pihet <>,
	Namhyung Kim <>,
	Paul Mackerras <>,
	Peter Zijlstra <>,
	Jiri Olsa <>
Subject: [PATCHv3 00/19] perf tools: Factor ordered samples queue

this patchset factors session's ordered samples queue,
and allows to limit the size of this queue.

v3 changes:
  - rebased to latest tip/perf/core
  - add comment for WARN in patch 8 (David)
  - added ordered-events debug variable (David)
  - renamed ordered_events_(get|put) to ordered_events_(new|delete)
  - renamed struct ordered_events_queue to struct ordered_events

v2 changes:
  - several small changes for review comments (Namhyung)

The report command queues events till any of following
conditions is reached:
  - PERF_RECORD_FINISHED_ROUND event is processed
  - end of the file is reached

Any of above conditions will force the queue to flush some
events while keeping all allocated memory for next events.

If PERF_RECORD_FINISHED_ROUND is missing the queue will
allocate memory for every single event in the
This could lead to enormous memory consuption and speed
degradation of report command for huge files.

With the quue allocation limit of 100 MB, I've got around
15% speedup on reporting of ~10GB file.

current code:
 Performance counter stats for './perf.old report --stdio -i' (3 runs):

   621,685,704,665      cycles                    ( +-  0.52% )
   873,397,467,969      instructions              ( +-  0.00% )

     286.133268732 seconds time elapsed           ( +-  1.13% )

with patches:
 Performance counter stats for './perf report --stdio -i' (3 runs):

   603,933,987,185      cycles                    ( +-  0.45% )
   869,139,445,070      instructions              ( +-  0.00% )

     245.337510637 seconds time elapsed           ( +-  0.49% )

The speed up seems to be mainly in less cycles spent in servicing
page faults:

current code:
     4.44%     0.01%  perf.old  [kernel.kallsyms]   [k] page_fault                                   

with patches:
     1.45%     0.00%      perf  [kernel.kallsyms]   [k] page_fault                                   

current code (faults event):
         6,643,807      faults                    ( +-  0.36% )

with patches (faults event):
         2,214,756      faults                    ( +-  3.03% )

Also now we have one of our big memory spender under control
and the ordered events queue code is put in separated object
with clear interface ready to be used by another command
like script.

Also reachable in here:


Cc: Arnaldo Carvalho de Melo <>
Cc: Corey Ashford <>
Cc: David Ahern <>
Cc: Frederic Weisbecker <>
Cc: Ingo Molnar <>
Cc: Jean Pihet <>
Cc: Namhyung Kim <>
Cc: Paul Mackerras <>
Cc: Peter Zijlstra <>
Signed-off-by: Jiri Olsa <>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists