lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 11 Mar 2010 19:16:46 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Cyrill Gorcunov <gorcunov@...nvz.org>
Cc:	Lin Ming <ming.m.lin@...el.com>, "H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Stephane Eranian <eranian@...gle.com>,
	Robert Richter <robert.richter@....com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14


* Cyrill Gorcunov <gorcunov@...nvz.org> wrote:

> x86,perf: Implement minimal P4 PMU driver v15

tried it on a Pentium-D dual core CPU, and it boots fine:

[    0.020009] using mwait in idle threads.
[    0.021004] Performance Events: Netburst events, Netburst P4/Xeon PMU driver.
[    0.024006] ... version:                0
[    0.025003] ... bit width:              40
[    0.026003] ... generic registers:      18
[    0.027003] ... value mask:             000000ffffffffff
[    0.028003] ... max period:             0000007fffffffff
[    0.029003] ... fixed-purpose events:   0
[    0.030003] ... event mask:             000000000003ffff
[    0.031027] ACPI: Core revision 20100121
[    0.050126] Setting APIC routing to flat
[    0.051010] enabled ExtINT on CPU#0

perf stat seems to work fine as well:

rhea:~> perf stat ls >/dev/null

 Performance counter stats for 'ls':

       6.596037  task-clock-msecs         #      0.439 CPUs 
              1  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            236  page-faults              #      0.036 M/sec
        4745843  cycles                   #    719.499 M/sec
              0  instructions             #      0.000 IPC  
  <not counted>  cache-references        
  <not counted>  cache-misses            

    0.015009286  seconds time elapsed

perf top works fine as well:

------------------------------------------------------------------------------
   PerfTop:   25056 irqs/sec  kernel:25.7% [100000 cycles],  (all, 2 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

              845.00 -  6.6% : __switch_to
              785.00 -  6.1% : schedule
              687.00 -  5.3% : perf_poll
              455.00 -  3.5% : _raw_spin_lock_irqsave
              436.00 -  3.4% : delay_tsc
              371.00 -  2.9% : fget_light
              346.00 -  2.7% : pick_next_task_fair
              328.00 -  2.5% : fput
              285.00 -  2.2% : free_poll_entry

i also triggered this:

[  436.224139] PMU: Dep events are not implemented yet

i'm getting a healthy amount of NMIs:

NMI:      44400     108796   Non-maskable interrupts

perf record + report works fine too:

# Samples: 32829281626
#
# Overhead          Command       Shared Object  Symbol
# ........  ...............  ..................  ......
#
    11.22%     pipe-test-1m  [kernel.kallsyms]   [k] __switch_to
     4.82%     pipe-test-1m  [kernel.kallsyms]   [k] switch_mm
     4.37%     pipe-test-1m  [kernel.kallsyms]   [k] schedule
     3.01%     pipe-test-1m  [kernel.kallsyms]   [k] pipe_read
     2.96%     pipe-test-1m  [kernel.kallsyms]   [k] system_call
     2.53%     pipe-test-1m  [kernel.kallsyms]   [k] update_curr
     2.15%     pipe-test-1m  [kernel.kallsyms]   [k] vfs_read

perf annotate __switch_to works too, and sees inside irqs-disabled regions due 
to NMI sampling:

    0.00 :      ffffffff81001664:       48 89 c2                mov    %rax,%rdx
    0.18 :      ffffffff81001667:       b9 00 01 00 c0          mov    $0xc0000100,%ecx
    0.00 :      ffffffff8100166c:       48 c1 ea 20             shr    $0x20,%rdx
    0.00 :      ffffffff81001670:       0f 30                   wrmsr  
   67.80 :      ffffffff81001672:       45 85 ff                test   %r15d,%r15d
    1.85 :      ffffffff81001675:       66 89 b3 8c 04 00 00    mov    %si,0x48c(%rbx)
    5.35 :      ffffffff8100167c:       41 0f b7 bd 8e 04 00    movzwl 0x48e(%r13),%edi
    0.00 :      ffffffff81001683:       00 

(and that wrmsr is indeed one known overhead point in __switch_to.)

All in one, the P4 PMU perf driver works on this box like a charm and all the 
common profiling workflows work out of box, without any serious limitations - 
really nice work! (Obviously some events wont work yet, etc.)

So it's pretty impressive and i've queued up your patch in tip:perf/x86 and 
will merge it into perf/core after others had a chance to test it too.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ